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Dedicated to 

A great statistician, linear algebraist and 
living legend who is in his 103" year and still 
inspiring the young generation 


Padma Vibhushan 
Calyampudi Radhakrishna Rao 
(September 10, 1920-) 


and our beloved friend who left us behind due 
to his untimely demise 


Lal da 
Arbind K. Lal 
(January 01, 1966—March 07, 2021) 


About Rao and Lal 


Prof. Calyampudi Radhakrishna Rao 


Calyampudi Radhakrishna Rao (1920-). Courtesy: Simo Puntanen 


Rao! was born on September 10, 1920, in a small town in south India called Hoovina 
Hadagali, then in the integrated Madras Province of British India, but now in the 
Bellary district of state Karnataka. He was the eighth child (out of ten: six boys and 
four girls) of his parents, C. D. Naidu and Laxmikanthamma. 

CRR joined the course ‘training in statistics’ offered by the ISI in January 1941. 
Six months later, Rao joined Calcutta University, which had just started a Master’s 
degree program in statistics, the first of its kind in India. He completed the course 
with first-class honors, first rank and a gold medal in 1943. In August 1946, CRR 
visited England on the invitation of J. C. Trever to work in the Anthropological 


! Thanks to Simo Puntanen for the contents of this section of Rao. 
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Museum and registered for a Ph.D. degree under Ronald Aylmer Fisher (1890- 
1962). Based on the work he did at the Museum, CRR received the Ph.D. degree 
from Cambridge University in 1948 with the thesis entitled “Statistical Problems of 
Biological Classifications’. 

Rao returned to India in August 1948, marrying Bhargavi (April 28, 1925—July 
24, 2017) on September 09, 1948. Professor Mahalanobis then offered Rao a Profes- 
sorship at ISI as well as the headship of the Research and Training School in 1949. 
He served in this position till 1963. Being an inspiring teacher, over the years about 
fifty students have completed their Ph.D. thesis under his guidance. CRR was the 
Director of ISI from 1963 to 1972, and, later, he was the Director and Secretary 
of the Indian Statistical Institute till 1976. He then continued in the ISI as a Jawa- 
harlal Nehru Professor till 1984. During this period, he took leave from ISI in 1979 
and went to the United States to accept a University Professorship at the University 
of Pittsburgh. He moved to The Pennsylvania State University in 1988, where he 
was offered several chair positions. He retired from Penn State at the age of 81, but 
continued doing research as the Director of the Center for Multivariate Analysis at 
Penn State until 2008. After that, he has the position as a Research Professor at the 
University at Buffalo. 

In 1954, C. R. Rao collected some data from Japan in order to study the long-term 
effects of radiation caused by the Hiroshima and Nagasaki atom bomb explosions. 
In the statistical analysis, he encountered the situation of finding a matrix to replace 
the inverse of X’X, where X is the design matrix appearing in the linear model; here 
the matrix X is not a full column rank matrix and X’X is singular and so the inverse 
was not defined. This led to a pseudo-inverse, which Rao introduced in his 1955 
paper ‘Analysis of dispersion for multiply classified data with an unequal number 
of cells’ in Sankhya. This was the same year that R. A. Penrose published his paper 
on the generalized inverse. [Proc. Cambridge Philos. Soc. 51: 406-413 (1955)]. An 
interesting note to be made here is that the notion of pseudo-inverse, initially defined 
for the operators, was introduced by Moore in 1920, the year in which Rao was born. 
C.R. Rao then discovered the key condition for a generalized inverse G of a matrix A 
satisfying the equation AGA = A and introduced the notation G = A~. The calculus 
of g-inverses and the unified theory of linear estimation were then presented in 1962. 
The subject of g-inverses was further developed with the help of his colleagues at 
the ISI, leading to the full-length monograph with Sujith Kumar Mitra, Generalized 
Inverse of Matrices and its Applications, published in 1971. Using the concept of the 
g-inverse, Rao developed a unified theory for linear estimation, noting that g-inverses 
were particularly helpful with explicit expressions for projectors. He introduced 
the Inverse Partitioned Matrix (IPM) method [Unified theory of linear estimation, 
Sankhya Ser. A 33: 371-394 (1972)] for computing summary statistics for the general 
linear model. Linear models, which have wide applications in statistics, have also 
provided outlets for some basic research in Linear Algebra: the special issues on 
linear algebra and statistics of Linear Algebra and its Applications [Vols. 67 (1985), 
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70 (1985), 82 (1986), 127 (1990), 176 (1992), 210 (1994)] bear witness to this. 
Some of Rao’s contributions are a general matrix inequality, separations for singular 
values and eigenvalues, matrix approximations and reduction of dimensionality, and 
Kantorovich type inequalities. 

C. R. Rao has developed statistical estimation theory in small samples, extending 
the scope of statistical methods in practice. Many results in this area bear 
his name, such as the Cramer-Rao lower bound, Rao-Blackwell theorem, Rao- 
Blackwellization, Fisher-Rao theorem, Rao’s second order efficiency and the Geary- 
Rao theorem on Pitman closeness criterion. Rao’s orthogonal arrays were a major 
contribution to research in coding theory and in experimental design—especially 
in industrial experimentation as developed by Genichi Taguchi. Rao has made 
pioneering contributions to the development of multivariate statistical analysis, 
especially with the concept of Rao’s quadratic entropy, Rao’s U-test, Rao’s F- 
approximation to the likelihood-ratio criterion and canonical coordinates. He has 
also introduced a broad class of asymptotic tests of hypothesis, including Rao’s score 
statistic and the Neyman-Rao statistic; he has used differential-geometric techniques 
in discussing problems of statistical inference, based on Rao’s distance function. 


2023 International Prize in Statistics. 


On 1 April 2023, it was announced on the website https://statprize.org/2023-Int 
ernational-Prize-in-Statistics- Awarded-to-C-R-Rao.cfm that C. R. Rao, a professor 
whose work more than seventy five years ago continues to exert a profound influence 
on science, has been awarded the 2023 International Prize in Statistics. The Award 
citation further reads: “In his remarkable 1945 paper published in the Bulletin of the 
Calcutta Mathematical Society, C. R. Rao demonstrated three fundamental results 
that paved the way for the modern field of statistics and provided statistical tools 
heavily used in science today”. Here is the complete bibliographical information 
of the paper: Information and accuracy attainable in the estimation of statistical 
parameters, Bulletin of Calcutta Mathematical Society, Vol 37, No 3, 81-91, 1945. 

The Prize was established in 2016 and is awarded once every two years to an 
individual or team “for major achievements using statistics to advance science, tech- 
nology and human welfare”. The Prize is managed by a foundation comprising 
representatives of the five major statistical organizations working cooperatively to 
develop this prestigious award which are American Statistical Association; Institute 
of Mathematical Statistics; International Biometric Society; International Statistical 
Institute; and Royal Statistical Society. The Prize is being considered as the statistics’ 
equivalent of the Nobel Prize. 
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Arbind Kumar Lal 


Arbind Kumar Lal (1966-2021). Courtesy: Sukanta Pati 


Lal? known as Lalda for the younger generation, was in long association with 
CARAMS, MAHE and its activities. He is known for his nature to help everyone 
(students, teaching and nonteaching staff) in trouble, and for his ability to resolve 
issues within the department. Professor Lal was popularly tagged as the ‘permanent 
HOD?’ by his colleagues. Born in Uttara (Madhubani district, Bihar) to Smt. Sushila 
Deviand Shri. Lekh Narayan Lal, his upbringing had made him a thorough gentleman 
and a person with a huge heart. 

He completed his schooling at Sainik School, Tilaiyaa, in 1983 and obtained his 
BA with Mathematics, Honors in 1986 from Hansraj College, Delhi. He went to 
do his MStat (1986-1988) at ISI Delhi, and this is where his growth as a math- 
ematician started. Because of his frank nature, being a sports-loving person and 
being a thoughtful person, he created many friends for life here. He completed his 
Ph.D. (1988-1993) from the same place, ISI Delhi. He was a visiting fellow at TIFR 
Mumbai for a year during which he worked on Combinatorics. After that, he became 
a visiting fellow at HRI Allahabad for two years during which he worked on coding 
theory. After that, he joined IIT Kanpur as a faculty in the Department of Mathematics 
in 1996 and served the department in various capacities till his untimely demise in 
2021. 

He guided seven Ph.D. students and his contribution ‘Aq-analogue of the Distance 
Matrix of a Tree’, Linear Algebra and its Applications, 416, 2-3, 2006, 799-814, 
has motivated many researchers to work on various aspects of the distance matrix. 
He was an editor of the journal ‘Proceedings Mathematical Sciences’ published by 
Springer and Indian Academy of Sciences. 

CARAMS, MAHE salutes him and documents his tribute, besides praying to the 
Almighty to bless Rao with good health, prosperity and undiminished energy to 
inspire young generations of the future. 


2 Thanks to Sukanta Pati for the contents of this section on Lal. 


Foreword 


It gives me immense pleasure that a special volume dedicated to the conferences 
organized by the “Center for Advanced Research in Applied Mathematics & Statis- 
tics, MAHE’ is being published by Springer Nature’s Indian Statistical Institute 
series in the form of a book-volume in honor of Profs. Calyampudi Radhakrishna 
Rao and Arbind K. Lal. CARAMS, MAHE initiated the activities in honor of Padma 
Vibhushan C. R. Rao, a great statistician and a living legend, in the year 2020. Due 
to the prevailing Covid situation, the events could not be organized in the normal 
physical format, but Rao’s hundred years of life was celebrated by organizing the 
‘International Conference on Applied Linear Algebra, Probability and Statistics’ in 
the online format. However, the conferences IWMS 2021 and ICLAA 2021, which 
were earlier scheduled in December 2020, have been rescheduled and organized 
in the hybrid format in December 2021. I congratulate the entire team led by the 
scientific advisory committee consisting of Ravindra B. Bapat, Manjunatha Prasad 
Karantha, Steve Kirkland, and Simo Puntanen for the great success of the events 
IWMS 2021 and ICLAA 2021 which brought several collaborators, disciples, and 
the followers of Rao together and connected with the young minds of India and 
abroad, remembering the achievements of Rao and further developments. Arbind K. 
Lal, an eminent young mathematician serving at IIT Kanpur, who had been contin- 
uously associated with the ICLAA series of conferences of MAHE, left us behind 
due to his untimely demise, and CARAMS, MAHE dedicated a session of ICLAA 
2021 to remember him. 

The book-volume ‘Applied Linear Algebra, Probability, and Statistics—A Volume 
in Honor of C. R. Rao and Arbind K. Lal’ is a part of several outcomes of the series 
of activities held by CARAMS, MAHE in honor of Rao. I understand that several 
associates of Rao and Lal and young scholars have contributed their work in the form 
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of book-chapters. I thank the entire editorial team and the authors for their efforts 
and contribution in the preparation of the book-volume ALAPS being published in 
the Indian Statistical Institute series of Springer Nature. 


April 2023 Narayana Sabhahit 
Pro Vice Chancellor—Technology and Science 

Manipal Academy of Higher Education 

Manipal, India 


Preface 


Linear Algebra, Probability Theory and Statistics are subjects having applications 
in almost every branch of science including Biology, Economics, Engineering and 
Physics, which are consistent with the focus research activities of CARAMS, MAHE. 
ICLAA 2021 is the fourth in its sequel, which started with CMTGIM 2012, and 
the fifth is ICLAA 2023 scheduled in December 2023. The initial intention was to 
dedicate the common day between WMS 2020 and ICLAA 2020 to honor C. R. Rao, 
a living legend among Indian Statisticians, who was celebrating the birth centenary 
in the year 2020. Unfortunately, the events could not be organized as per schedule in 
the normal format due to the prevailing situation. With the hope of having a normal 
life in 2021, the events in the physical formats have been rescheduled to December 
2021, and CARAMS organized ALAPS 2020 (International Conference on Applied 
Linear Algebra, Probability, and Statistics) in the online format (164 participants—25 
invited, 40 contributory), in which many collaborators, disciples and followers of Rao 
participated. However, IWMS 2021 (192 participants—20 invited, 14 contributory) 
and ICLAA 2021 (242 participants—25 invited, 28 contributory) were organized 
successfully in the hybrid format and decided to publish a special issue in ‘AKCE 
International Journal of Graphs and Combinatorics’, and a book-volume ‘Applied 
Linear Algebra, Probability, and Statistics—A volume in honor of Rao and Lal’ 
(ALAPS) in the ISI series of Springer Nature. 

The present volume is dedicated to Prof. Rao, who completed 100 years of 
legendary life and continues to inspire all of us, and Prof. Arbind Lal, who sadly 
departed us too early. The book focuses on, but is not limited to, linear algebra, statis- 
tics, matrices, graphs and applications—the areas to which Rao and Lal contributed. 
The volume is rich with contributions from collaborators, students, colleagues and 
admirers of Profs. Rao and Lal. 

Many chapters feature new findings due to applications of matrix and graph 
methods, while others present rediscoveries of the subject using new methods. Chap- 
ters from Simo Puntanen, Steve Haslet and their collaborators reflect the continu- 
ation of Rao’s legacy in the use of matrix methods in statistics, whereas two from 
TES Raghavan describe the use of matrix-graph methods in the computation of the 
nucleolus and Shapley value, which appear in game theory. The contributions from 
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Raghavan are based on his lectures delivered at the workshop and conferences held 
in honor of Rao. 

Some chapters are expository, summarizing the development of subjects and 
presenting new open problems. BLS Prakasa Rao’s ‘On some matrix versions of 
covariance, harmonic mean and other inequalities: an overview’ is one such article. 
Jeffery Hunter’s contribution “The impact of Prof. Rao’s research used in solving 
problems in applied probability’ describes how Rao’s work influenced the author’s 
research. 

Neogy et al. (On some special matrices and its applications in linear complemen- 
tarity problem) and Eagambaram (Characterization of Q-matrices using bordered 
matrix algorithm) discuss special classes of matrices having applications in the linear 
complementarity problem. The chapters by Saumyadipta Pyne and Sudeep Bapat 
present key applications of statistical data analysis to real-life problems. The book 
also includes one of Lal’s last works ‘Some observations on algebraic connectivity 
of graphs’. 

With many more chapters on Generalized Inverses, Matrix Analysis, Matrices and 
Graphs, and the History of Ancient Mathematics, this volume offers a diverse array 
of mathematical results, techniques, applications and perspectives. Linear Algebra, 
Probability and Statistics are important branches of mathematics having applications 
in every dimension of science. The book promises to be especially rewarding for 
readers with an interest in the focus areas of ALAPS. 


Manipal, Karnataka, India Ravindra B. Bapat 
April 2023 Manjunatha Prasad Karantha 
Stephen J. Kirkland 

Samir Kumar Neogy 

Sukanta Pati 

Simo Puntanen 
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1.1. Introduction 


C. R. Rao has extensively used matrix algebra in his study of unified theory of linear 
models. He has shown that some of the important results in matrix algebra can be 
derived from statistical results in Rao [19]. Dey et al. [6] gave proofs of some results 
in matrix algebra using ideas derived from statistics. Mitra [13] gave statistical proofs 
of some results for non-negative definite matrices. Rao [18] derived seven inequalities 
in statistical estimation theory one of which deals with the harmonic mean of positive 
random variables. A generalization of these results for random matrices was derived 
in Prakasa Rao [17]. Rao [19] gave an alternate proof of this result using ideas from 
statistics. Kimeldorf and Sampson [11] derived a class of covariance inequalities. 
Matrix versions of these inequalities were obtained in Prakasa Rao [16]. We will 
now give a short overview of some of these results dealing with the harmonic mean 
for positive random variables and some covariance inequalities and the recent work 
of Gibilisco and Hansen [9] generalizing the results of Rao [18] and Prakasa Rao 
[17] and present some open problems. 
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1.2 Covariance Inequalities 


Let A be an open convex subset of R? and (X, Y) be a bivariate random vector with 
support A. Further suppose that E(X7) and E(¥7) are finite. Suppose g(x, y) is a 
function satisfying the condition 


(x2 — x1)(y2 — yu) S [82 ¥2) — 8G. WP, Gi) € A= 1,2. 
Kimeldorf and Sampson [11] proved that 
Cov(X, Y) < Var[g(X, Y)]. 


A matrix version of this result was proved in Prakasa Rao [16]. We will now 
review this result and give some applications. 

Hereafter, we consider only random matrices with real entries. 

Suppose & and 7 are two random matrices of order r x s and further suppose that 
all the entries of these matrices have finite second moments. Let E(é) be the matrix 
of expectations of the entries of the matrix &. Let €’ denote the transpose of the matrix 
€. For any two random matrices € and n, define the covariance and variance by the 
following relations: 


1 
Cou. 1) = IE — EE) = E(n)y 1+ Ely — E™@)E — EE) AD 


and 


Var[&] = EL — E(§))(€ — E))'). (1.2) 


It is clear from the definitions given above that the matrix Var (&) is a non-negative 
definite matrix and that the covariance matrix Cov(&, n) is a symmetric matrix. If 
the matrices € and 7 are random commuting symmetric matrices (and hence r = s), 
then 


Cov, n) = ELE — E))(7 — E@))' (1.3) 
= E(§n) — E(§)E(n) 


which is analogous to the standard definition of the covariance for two random 
variables. For any two square matrices A and B of the same order, we say that 
A < B if the matrix B — A is non-negative definite. We say that A < B if B — Ais 
positive definite. 

The following result is proved in Prakasa Rao [16]. 


Theorem 1.1 Let & and n be random matrices of order r x s such that P((E,) € 
A] = 1 where A is a subset of R’** x R’™**. Let g(A, B) be a function defined over 
the space R™™’ x R'™* taking values in R'™*. Then 
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Cov(§,n) < Var[g(&,n)] (1.4) 


if and only if, for all (A;, B;) € A, i = 1, 2, 


1 
5 l(A2 — A1)(B2 — Bi)! + (Ba — Bi)(A2 — Ary] (1.5) 
< [g(A2, Bo) — g(A1, Bi)I[g(A2, Bz) — g(A1, BY’. 


For proof, see Prakasa Rao [16]. We now consider some interesting special cases 
of this theorem. 


Corollary 1.1 Suppose & and n are random matrices of orderr x r. Let g(A, B) = 
Ate It is easy to check that the condition (1.5) holds using the fact that 


pee By= Bi, Aa AI Ba — Biy 9, 
2 2 2 2 


Hence, an application of Theorem 1.1 shows that 


Cov(é,n) < Vart® = 


]. (1.6) 


Corollary 1.2 Suppose & and n are positive definite matrices of orderr x r. Then A? 
and B? are symmetric matrices. Suppose they commute. Define g(A, B) = A ?Bi= 
B2A2 by commutativity. It can be checked that the condition (1.5) holds and hence, 
by Theorem 1.1, 


Cov(&,n) < Var[é?n?]. (1.7) 


This inequality can also be obtained as a consequence of the following matrix version 
of the Cauchy-Schwartz inequality (cf. Ibragimov and Khasminskii [10, p. 73]). 


Theorem 1.2 Let & and n be two random matrices of orderr x s with the entries of 
these matrices having finite variances. Further suppose that E(€) = E(n) = 0 and 
E(nn') > 0. Then 


E(&§') > E(n')[E(nm’)]' E(né’). (1.8) 


Example 1 


It is easy to see that, for any positive definite matrix D, 
pap S37 (1.9) 


since (D2 =p> )? > 0. In particular, for any two positive definite matrices A; and 
Ap, 
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(Ai — A2)(Ay | — Ap!) = 21 — AzAj! — Ay Az! = 27 -—D-—D' <0 (1.10) 


where D = A2A;!. Let g(A, B) = AB. It can be checked that the condition (1.5) 
holds for the function g(A, B) and hence, by Theorem 1.1, 


Covlé, §'] < Var[g(é, €-')] = 0. (1.11) 


This in turn implies that 
E(EJE(“'1 > 1 (1.12) 


for any positive definite random matrix €. This was proved by alternate arguments 
in Babrovsky et al. [4]. 


Remark 1.1 For any two positive definite matrices A and B, define 


A+B 
AAB = - , (1.13) 
A#B = A?(A~?BA~?)2A?, (1.14) 
and 1 
AIB= A" Ry, (1.15) 
Ando [2] proved that 
A!B < A#B < AAB (1.16) 


generalizing the arithmetic mean-geometric mean-harmonic mean inequality for pos- 
itive real numbers. It was also proved by Ando [1] that 


AAB > [54" + B1)]'/4 (1.17) 


for 5 <q < 1. Under the commutativity assumption on A? and B? , itcan be checked 
that A#B = (AB)? = A?B?. It was shown earlier that Theorem 1.1 holds for the 
function g;(A, B) = AAB and for the function g2(A, B) = A? B? under the com- 
mutativity condition. Theorem |.1 does not hold the function g3(A, B) = A!B even 
for real-valued random variables from Kimeldorf and Sampson [11]. They have 
proved that Theorem 1.1 holds for the function 


g(x,y) = [50° +y)}'/? 0<p<1 


for real-valued random variables. 
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It is an open problem to find whether Theorem 1.1 holds for the function 


1 
ga(A, B) =[5(Al + BIN, a@= 1, 
and for the function 
g5(A, B) = A#B 


in view of Ando’s inequality (1.17). 


For other remarks on such inequalities for random matrices, see Prakasa Rao [16]. 


1.3. Harmonic Mean Inequalities 


Rao [18] proved the following result. 


Theorem 1.3 /f X and Y are positive-valued random variables with finite expecta- 
tions, then 


[4 1 I 29 
Hits) Ieee aa (1.18) 


1 
Y E(X) | E(Y) 
Rao’s proof involves the use of Holder’s inequality and the dominated convergence 
arguments. We have extended this result to positive definite random matrices in 
Prakasa Rao [17]. 


Theorem 1.4 /f A and B are almost surely positive definite random matrices 
with E(A), E(A~'), E(B) and E(B~') finite and either E(A~') — (E[A])7! or 
E(B"') — (E[B])~ is non-singular, then 


E(A* + BO )) < (EIA) + (EIB) )". (1.19) 


Theorem 1.4 can be proved in several ways. The first method uses the Jensen 
inequality for random matrices. For the notion of convexity for functions defined on 
the space of matrices, see Davis [5]. 


Proof For any fixed positive definite matrix M, the matrix-valued function g(X) = 
(M + X7!)~! is concave in X over the class of positive definite matrices X, and 
hence 

E[(M+X™')"'] <[M+ E(X)"'T". 


Taking expectations with respect to the probability measure of the random matrix M 
on both sides, the result follows. 


Other proofs of Theorem 1.4 are based on the following lemmas which are of 
independent interest. 
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Lemma 1.1 For any two positive definite matrices G and H, 

G'-H'>H'H-GEH'. (1.20) 
Lemma 1.2 Let G any positive definite random matrix such that E[G] is finite and 
non-singular. Then 

E(G~'] > (EG). (1.21) 

Lemma 1.3 Let G and H be positive definite random matrices with finite expecta- 
tions. Then 


E(GH"'G) > E[G](E[H])!E[G]. (1.22) 


Lemma 1.1 is a consequence of the fact that 


(f'—@ ee =n) =6: 


Lemma 1.2 follows from the inequality proved in (1.12) given earlier (see 
Bobrovsky et al. [4] and Prakasa Rao [16]). 

Lemma 1.3 is a consequence of the fact that the function g(A, B) = AB~'A is 
convex over the class of positive definite random matrices. It is also a consequence 
of Corollary 3.1 in Ando [1]. Two alternate proofs of Theorem 1.4 can be given 
using the lemmas stated above. For details, see Prakasa Rao [17] and Rao [19]. An 
application of Theorem 1.4 to statistical inference is given in Prakasa Rao [17]. 


1.4 General Mean Inequalities 


Let x and y be positive real numbers. The arithmetic, geometric, harmonic and 
logarithmic means are defined by 


x+y 
Ma(%, y) = ys Mg (&, Y) = o/ XY 
and 
(x, y) : (x, y) = —— 
m,(x, y) = ——————,, me (x, y) = ———_ 
ee x-}4y-! oe log x — log y 


Suppose X and Y are positive random variables. Linearity of the expectation operator 
implies 

E{mg(X, Y)] = ma(E(X), E(Y)). 
Cauchy-Schwartz inequality implies that 


E|m,(X, Y)] < mg(E(X), E(Y)). 
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Rao [18, 19] and Prakasa Rao [17] proved that 
E[my(X, Y)] < mp(E(X), E(Y)) 


as explained in the last section. Gibilisco and Hansen [9] investigated the generality of 
these results including the question of whether such a result holds for the logarithmic 
mean. We will now give a brief survey of some of their results. 


Definition 1.1 A bivariate mean is a function m : R, x R; — R, such that 

(i) m(x, x) =x; 

(ii) m(x, y) = my, x); 

(iii) x < y implies that x < m(x, y) < y; 

(iv) x < x’ and y < y’ implies that m(x, y) < m(x’, y’); 

(v) the function m(x, y) is continuous; and 

(vi) the function m(x, y) is positively homogeneous, that is, m(tx, ty) = t m(x, y) 
fort > 0. 


The definition of the bivariate mean given above is due to Petz and Temesi [15]. 
Let M be the set of all bivariate means as defined above. 


Definition 1.2 Let F denote the class of functions f : Rz — R, such that 
(i) f is continuous; 

(ii) f is monotone increasing; 

Gui) f(1) = 1; 

(iv) tf t7'!) = f@ fort > 0. 


It can be checked that there is a bijection between the classes M and F given by 
the formulae 


myp(x,y) = yf(Qy'x) and fn(t) = m(L, f) (1.23) 


for positive numbers x, y and t. We now give some examples of means. 
(i) The function f(x) = a corresponds to the arithmetic mean =. 

(ii) The function f(x) = Ji corresponds to the geometric mean ,/xy. 
(iii) The function f(x) = corresponds to the harmonic mean ———s 


+ 
(iv) The function f(x) = 


(v) the function f(x) = eee 0 < B < | corresponds to the mean 


mT 


| corresponds to the logarithmic mean —*—»— 
7 3% 


xP y!-B 4 IB yB 
—— 


ie x 


Observe that all the functions described in (i)-(v) given above belong to the class 
F and they are concave. Gibilisco and Hansen [9] proved the following result. 


Theorem 1.5 Let f be a function in the class F. The inequality 
Elm ¢(X, Y)] < m/(E(X), E(Y)) (1.24) 


holds for arbitrary positive random variables X and Y if and only if f is concave. 
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For a proof of Theorem 1.5, see Gibilisco and Hansen [9]. In particular, the result 
holds for the logarithmic mean for positive random variables. They extended Theo- 
rem 1.5 for operator means in the sense of Kubo and Ando [20]. We refer the reader 
to Gibilisco and Hansen [9] for details. 

An-mean ona set X is an-ary operationm : X" — X whichsatisfies the condition 


m(x,X,...,X) =x,x EX. 
It is said to be symmetric if 
M(Xe(1); sey Xo(n)) = m(x1, tees Xn) 


for o € S, where S,, is the symmetric group of order n. 

Open Problems: (1) In a recent paper, Gibilisco [8] posed the following problem: 
suppose f is a function in F as given by Definition 2 and X,,..., X, are positive 
random variables defined on a probability space (Q,G, P). Is it true that Jensen 
inequality holds for n-means, that is, if f is concave, does it hold that 


E(m,(X, .++,Xn)) S m ¢(E(X)), try E(X,)) 


for all positive random variables X,,..., X,? Jensen’s inequality for bivariate means 
has been proved for operator means in Kubo-Ando [20] (cf. Gibilisco and Hansen 
[9], Prakasa Rao [17] and Rao [19]). Can it be generalized to n-means in the non- 
commutative setting? 


(2) Given a set of positive numbers x, ..., x, and a positive integer k < n, define 
M® = Li <it,...igsnXiy ++ Nig I/k 
nn = hae 
(nc) 


It is known that 
MEO ety ss oy nM Gr, 0.45 %n) S (MOG. te) 2 Sk Sn 1 
(cf. Hardy et al. [21]) which in turn implies that 
1 2 
M‘ Ox, cee Xn) = M‘ Or, ony Xn) Zz eats = M” (x1, vee wp) 
Note that M“ is the arithmetic mean and M“ is the geometric mean. Let J be a 
nondegenerate interval of R andn > 1. A function M : I” > R is called n-variable 
mean on J if 


min(x,,...,X%») < M(x,...,%,) < max(x1,...,%),X1,...,% EL. 


Let f : J > Rbeacontinuous and strictly increasing function. The n-variable quasi- 
arithmetic mean Al : I” — Tis defined by 
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1 n 
Ai gS ae Sf tiem el 


i=l 


where f~! is the inverse of the function f. The function f is called the generator 
of the quasi-arithmetic mean Af . The arithmetic mean, the geometric mean and the 
harmonic mean are quasi-arithmetic means corresponding to the functions f : R > 
R, f(x) =x,x ER; f: (0,0) > R, f(x) =logx,x >0; and f:(0,«~)-> 
R, f(x) = x7!, x > 0, respectively. 

Let J be a compact non-degenerate interval of R and M, : I” ~ R,n>1bea 
sequence of functions. It is known that the sequence of functions M,,,n > 1 is quasi- 
arithmetic, that is, there exists a continuous strictly monotone function f:] —> R 
such that 

MAAKis5085 Xe) = rhe Tee 7 Dee a el,n>1 


if and only if the sequence M,,,n > | satisfies the following properties: 
(i) M,, is continuous and strictly increasing in each variable for every n > 1; 


(ii) M,, is symmetric for each n > 1, that is, Mn(x1,..-, Xn) = Mino), ---s Xo) 
for every permutation o(1),...,0(m) of 1, ...,7; 

(iii) M,(x,...,x) =x,x € J; and 

CV) Migim (1, 6665 Xns Vises Vm) = Mug ns -- +5 Xni Yis+++s Ym) for every n > 
l,m >1,x1,-.-,Xa, Vis +--+ Ym € I" where X, = Mn(xX1,..., Xn) 


(cf. Kolmogorov [12], Nagumo [14] and de Finneti [7]). 

Bajraktarevic [3] generalized the notion of the quasi-arithmetic mean. Let n > 
1 and J be a non-degenerate interval of R. Let f : J — R be a continuous and 
strictly monotone function and p(.) : J — (0, co) be a weight function. Then the 
Bajraktarevic mean Bf ‘Ps I" + T is given by 


i P(x) f (xi) 


ah 


BPP Gy, 0.5%) = FC 
n n n p(x) 


), Xips.-5X%_ EL. 


The pair (f, p) is called the generator of Bi”. It would be interesting to find out 
whether the Jensen inequality can be extended to Bajraktarevic means and other 
types of quasi-arithmetic means defined above for positive random variables. 
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Outline 

We commence this article with a summary of the key results developed by Prof. 
Calyampudi Radhakrishna Rao related to his research on generalized matrix inverses. 
This is followed by a personal chronology of how the author became exposed to Prof. 
Rao’s fundamental research on generalized matrix inverses. We then present a wide 
variety of results developed by the author in applying Prof. Rao’s results to the field 
of Applied Probability. Detailed proofs are not provided for all the results, but full 
references are given. A description of the formal links that Prof. Hunter has had with 
Prof. Rao is also included. 


2.1 Professor Rao’s Research on Generalized Matrix 
Inverses 


There were three key publications that set the scene for Rao’s work on generalized 
matrix inverses (see the references [28—30]). In addition, two other related publica- 
tions by Rao on generalized matrix inverses have been utilized in some of the applied 
probability applications: [31, 32]. 

Let us comment briefly on the key results contained in these aforementioned 
works. 
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Rao [29] Analysis of Dispersion for Multiply Classified Data with unequal num- 
bers in cells, Sankhya: The Indian Journal of Statistics, 15(3), 253-280. 


In this paper, Rao introduces the concept of a “pseudo-inverse” of a singular matrix in 
an appendix to the paper concerning the computational aspects of the general theory 
of least squares. In matrix notation, the normal equations are expressed as AT = 
Q which is equivalent to CAT = CQ = AC~!"T where CA = AC!" implying 
T = C'VCQ by formally treating V as the inverse of A, where A is a diagonal 
matrix containing the diagonal elements of CA. Thus, C’ VC behaves as the inverse 
of A= C-'Ac-!*, 


Rao [30] A note on a generalized inverse of a matrix with applications to prob- 
lems in Mathematical Statistics, Journal of the Royal Statistical Society, Series B 
(Methodological), 24(1), 152-158. 


In this paper, Rao defines a generalized inverse (g-inverse) of a matrix A, of order 
m Xn, as a matrix of order n x m, denoted as A~ such that for any y for which 
Ax = y is consistent, x = Ay is a solution. 


Lemma 2.1 /f A~ is a g-inverse of A, then AA” A = A and conversely. 


A general solution to Ax = y, when consistent, isx = A-y+ (A A — /)z where z 
is arbitrary. The computation of a g-inverse, by the sweep-out process, is presented 
and applied to the theory of least squares. 


Rao [31] Linear Statistical Inference and its Applications, Wiley, New York. 


In this book, Rao builds on the previous paper and introduces the 4—condition 
Moore-Penrose g-inverse, A* (to distinguish it from a 1—condition g-inverse, A7 ) 
satisfying the four properties: (1) AA* A, (2) AtAAt = At, (3) (AAT)? = AAt, 
(4) (At A)? = AtA. Such an inverse exists and is unique. This well-known g- 
inverse was credited by Rao to E. H. Moore in 1935 [26] and R. Penrose in 1955 
[28] although some of Moore’s results appeared earlier in 1920 [25] as an abstract 
of his talk at an AMS meeting. 


Rao [32] Generalized inverse for matrices and its applications in mathematical 
statistics, Research Papers in Statistics, Festschrift for J Neyman, Wiley, New 
York. 


In this paper, extending his earlier results, we have explicitly the following. 


Lemma 2.2 The necessary and sufficient condition that the equation AX = Y is 
consistent is that AA~Y = Y. 


This leads to the following lemma. 


Lemma 2.3 A necessary and sufficient condition for the equation AX B = C to have 
asolutionis AA” CB” B = C where A” is any g-inverse of A and B™ is any g-inverse 
of B in which case the general solution is X = ATCB” + Z— A” AZBB where 
Z is arbitrary. 
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Rao [33] Calculus of generalized inverses of matrices, Part 1: General Theory, 
Sankhya: The Indian Journal of Statistics, Series A, 29(3), 317-342. 


Finally, in this paper a theorem, providing a representation of all g-inverses, is pro- 
vided. 


Theorem 2.1 The general solution to a g-inverse of a given m x n matrix A is 
A~ +U—A~AUAA™ where A~ is a particular g-inverse and U is an arbitrary 
n Xm matrix. 


Regarding multi-condition g-inverses, the two following lemmas are established. 


Lemma 2.4 The necessary and sufficient conditions that X = GY is a minimum 
norm solution of the consistent equation AX = Y is (i) AGA = A (ii) (GA)? = 
GA. (i.e. an Af{1, 4} g-inverse). 


Lemma 2.5 The necessary and sufficient conditions that X = GY is a least square 
solution of the equation AX = Y is (i) AGA=A (ii) (AG)’ = AG. (i.e. an 
A{1, 3} g-inverse). 


As we shall see, the importance of Rao’s work in applied probability lies in the 
introduction of generalized matrix inverses and the application of Lemmas 2.1 and 2.3 
to a variety of situations where linear equations arise naturally in stochastic situations 
requiring unique solutions. However, to apply such techniques the identification 
of particular g-inverses is required. Other than assisting in the classification of g- 
inverses, these last two lemmas are of more relevance to statistical applications rather 
than to applied probability. 

A number of authors have explored the applications of matrix theory to problems in 
applied probability but specific references to Rao’s contributions utilizing generalized 
inverses and the solution of linear equations are infrequent. 


2.2 Professor Hunter Is Exposed to Prof. Rao’s Research 


In February 1965, the author was in his second semester of coursework toward a 
Ph.D. in Statistics from the Department of Statistics, University of North Carolina 
at Chapel Hill. Although he was intending to do research in applied probability, 
one of the graduate-level courses, designed to enable a student to undertake the 
preliminary doctoral written examinations, was Statistics 150 “Analysis of Variance 
with Applications to Experimental Design” taught by Prof. Indra Mohan Chakravarti, 
using notes prepared by Prof. Raj Chandra Bose. In this course, the notion of a 
“conditional inverse” was introduced. The following are extracts from Bose’s notes 
(with the numbering changed for this article): 


1. Conditional inverse of a matrix. 
The inverse of a non-singular matrix has already been defined. We shall now consider a 
generalization of the notion of inverse which we shall find useful in certain circumstances. 
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Let A = A(m x n) be any matrix. Then A* = A*(n x m) will be defined to be a conditional 
inverse of A, if 


AA*A=A. (2.1) 


When A is square and non-singular, A~!AA*AA~! = A~!AA™!. Hence A* = A7! and is 
uniquely determined. We shall show that in the general case A* exists but is not uniquely 
determined by (2.1), when A is given. We first prove the following theorem. 


(a) If A* = A*(n x m) is a conditional inverse of A = A(m x n) and if the equations 
Ax =y (2.2) 


are consistent, then x, is a solution of (2.2) if x} = A*y. 


(b) If xj = A*y is a solution of Eq. (2.2) for every y for which (2.2) are consistent, then 
A* must be a conditional inverse of A. 


It is interesting to reflect on how the notion of a conditional inverse was used in 
the lecture notes for this course, rather than the now common terminology of a 
generalized inverse, or the simpler form, a g-inverse.. 

It is worth pointing out that there had been strong interactions between all three 
aforementioned academics. Firstly, Prof. Raj Chandra Bose joined the University 
of Calcutta in 1940 where C. R. Rao was in the first group of students that he 
taught. Professor Bose became Head of the Department of Statistics in 1945 at the 
University of Calcutta but moved to the Department of Statistics at the University of 
North Carolina at Chapel Hill in 1949. 

Secondly, Prof. Indra M. Chakravarti, who taught the aforementioned graduate- 
level course, obtained his Ph.D. from the Indian Statistical Institute at Kolkata in 
1958 having been supervised by Prof. Rao. Indra joined the Department of Statistics 
at the University of North Carolina at Chapel Hill in 1964. 

Thirdly, the connection between all three was strengthened in 1969 when all three 
edited the book “Essays in Probability and Statistics”, published in the Monograph 
Series in Probability and Statistics, by the University of North Carolina Press. 

The author had the opportunity to study various different areas of statistics, prior 
to deciding on a research topic, as part of the coursework requirements for a Ph.D. in 
the U.S. At the Dept. of Statistics at UNC Chapel Hill, he had the opportunity to take 
courses from a number of leading figures in their fields including Norman Johnson, 
Wassily Hoeffding, Walter Smith and Ross Leadbetter. 

The author’s first exposure to Prof. Rao’s research was as an undergraduate in 
Mathematics at the University of Auckland learning of some of his pioneering clas- 
sical results, in particular the Cramer-Rao lower bound. Besides meeting with Prof. 
Rao, he also had the opportunity to meet with Prof. Harold Cramer during one of his 
visits to the Department of Statistics at the University of North Carolina at Chapel 
Hill while he was writing a book with Ross Leadbetter. 
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2.3 Professor Hunter’s Applications of Prof. Rao’s 
Research in the Field of Applied Probability 


Professor Hunter pursued his Ph.D. research in the general area of applied probability 
under the supervision of Walter Smith completing a thesis “On the necessary and 
sufficient conditions for the convergence of the renewal density matrix of a semi- 
Markov process”. His early research, following his Ph.D., focused on Markov renewal 
processes and their embedded Markov chains [2, 3, 7, 8]. 

Four papers set the scene for his research utilizing generalized matrix inverses in 
applied probability. Prior to discussing the main results of those papers, we introduce 
some notation that we shall use throughout this paper. 

Let {X,}, (n => 0), be a discrete time Markov chain (M.C.) with state space 
S = {1,2,...,m}, transition probabilities pjj = P{Xn41 = jf | Xn =i}, GJ) € 
S x S and transition matrix P = [p;;]. Let 1 = ex’ where e” = (1, 1,..., 1) and 
x’ is the stationary probability vector of the M. C. Let E = [1] and if A= [aj;] 
define Ay = [6;;a4;;] (where 5;; = 1 if i = j and O otherwise), the diagonal matrix 
of diagonal elements of A. 

Let {(X,,, T,,)} (2 = 0) be a Markov Renewal Process (M. R. P.) with semi-Markov 
kernel O(t) = [Qi i (t)] where X,, denotes the state of the system following the n-th 
transition, after the initial transition (Xo being the initial state) in a finite state process 
with state space S and T, is the time of the n-th transition. Let Q;;(t) = P{Xn41 = 
i, Tn41 — Tr St | Xn =i}, Gi, j) € S. Further, P = [p;;] where p;j = Q;;(+00), 
so that P is the transition matrix of the embedded M. C. {X,,}. Let 7;; be the time for 
the first passage from state i to j in the M. R. P., and let G;;(t) be the distribution 
function of 7;;. 

Associated with the M. R. P. is the semi-Markov Process (S. M. P.) Y; = X,, for 
T, <t < T,4, that records the state that M. R. P. is in at each time tf. 

Further let m{) = f>° t§dG;j(t) and u{? = f>° t*d Qi; (t) with m; = mj; and 
hij = Te , so that the m\,) is the k-th moment of the first passage from state 7 to 
state j. 

Define M” = [mi] and P® = [7] with M = M and w = Pe. 

Let N;(t) be the number of visits the M. R. P. makes to state j, up to and including 
time t, M(t) = [Mi;(t)] where M;;(t) = E{N;(t) | Xo =i}, t = 0,ie. M;j(t) isa 
general renewal function of the process, the expected number of visits the M. R. P. 
makes to state j up to time t, with Xo = i. 

In the special case of a M. C. in discrete time, T, =n, wij = pij, and G;j(t) = 
| 1t>1, 

0,¢t <1; 
Xo =i, 1.e. the number of visits the M. C. makes to state 7, starting in state 7, while 
Ne are the number of k (1 < k <n), so that M; =NM+L bi;- 

For a M. C. in continuous time, the elements of the semi-Markov kernel have the 

form Oi; (t) = Pij [1 _ et). 


and the M; are the number of k (0 < k <n) such that X; = j given 
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Hunter [2] On the moments of Markov Renewal Processes, Advances in Applied 
Probability, 1(2), 188-210. 


In this first paper, the well-known fundamental matrix of regular Markov chains, Z = 
[J — P + l1]"', as introduced by Kemeny and Snell, was shown to be a 1—condition 
g-inverse of J — P where P is the transition matrix of the Markov chain. The proof 
of this result used the well-known result ([23], p. 100) that (7 — P)Z = 7 —T1. 
This matrix Z was then applied to solving a system of linear equations that arose 
in exploring the properties of a M. R. P. and the special case of a M. C. 
It was established that, for r > 1, 


m 


r-1 
(r) (r) 7 (r—s)_ (8) (r) 
mi = pum +o (7) | ome Pmy | + oa. 
s=1 s=1 


kAj kj 
leading to 


r-1 
(1-P)M™=)° Ge. [u® = My + POE— PM. 
S 
s=l 


In particular, when r = 1, 7 — P)M = PE — PMy. 

The paper discusses the solution of these matrix equations, for M. R. P.’s, by 
solving them directly using generalized matrix techniques, and using Z as a particular 
g-inverse of J — P. In particular, the second main result was an expression for the 
matrix of the mean first passage times, 


1 (1) (1) 
M= ~ {ZP M— E(ZP‘'M) a} +1 —Z+ EZq| D, 
1 


where A; = a7, D = Mg = A, (Mq)7!. 
The third main result of this paper provides an expression for the asymptotic 
values of the moments of the M. R. P., M(t), in terms of Z, 


er geet 
m+ |—ne® —7| Zz) —p@n—7| 7+ 0(1)E. 
Ay Ay 


Hunter [3] Generalized Inverses and their application to Applied Probability 
Problems, Linear Algebra & its Applications, 45, 157-198. 


The first key result in this paper was the derivation of more general g-inverses of 
I — P, where P is the transition matrix of a finite irreducible M. C. with stationary 
probability vector x7. In particular, if uw and ¢ are such that u’e 4 O and a‘t 40, 
then J — P + tu” is non-singular and [J — P + tu™| is a g-inverse of I — P. 

The results of Rao [32] were utilized to show that any g-inverse of J — P has the 
form [I — P + tuT|' +ef' + gn’. 
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In the paper, general procedures were established for finding stationary distri- 
butions, moments of first passage time distributions and asymptotic forms of the 
occupation time random variables for not only M. C.’s but more generally M. R. 
P.’s and S. M. P.’s including M. C.’s in continuous time. In particular, some specific 
special results for discrete time M.C.’s are as follows: Let G be any g-inverse of 
I-P. 

vA 
If A=1—(—P)Gthenax’? = Tac for any v’ such that v’ Ae + 0. 
M =[GII — E(GI)g + 7 — G+ EG,] D where D = (TIq)7!. 


[Ens] =@+ DM -74+U-MGd -M) + oe, 


Hunter [7] Characterizations of Generalized Inverses associated with Marko- 
vian kernels, Linear Algebra & its Applications, 102, 121-142. 


This paper derives characterizations of generalized inverses associated with Marko- 
vian kernels A = J — P. We consider the standard 4-conditions of the Moore- 
Penrose g-inverse together with a 5" condition, applicable for square matrices. We 
let A{i,...,k} denote the family of all g-inverses X of the matrix A that satisfy 
the conditions i,..., from the following numbered conditions (1) AXA = A, 
(2) XAX = X, (3) (AX)’ = AX, (4) (XA)’ = XA, (5). For square matrices 
AX = XA. 

Characterizations of A{1, 2}, A{1, 3}, A{1, 4}, A{1, 5}, A{1, j, &} are all given. 

Some special cases are considered. 

The group inverse A{1,2,5}=[/— P+ ex™]' —en' =(1—en’)G(1— 
ex’) for any g-inverse G of J — P. The use of this particular g-inverse in rela- 
tion to M. C.’s was explored extensively by Meyer [24]. 

The Moore-Penrose g-inverse A{1,2,3,4}=[/— P+ ane™] —aen', 
where a = (ma! 2)~'/? was given by Paige, Styan and Wachter [27]. 

It was also shown that one can transform from one g-inverse to another by using 
the following result. 

If, for i = 1, 2, uw; and ¢; are such that u?e A 0 and x’t; 4 0 then 


[7 P+tu3]' =U -U,)[1-P+tul]' Ud - 1) + Boen™ 


ane eus, ton! dp 1 
where Up = —+, T, = —— and By» = —— 
uye nm" ty (xT tr) (uj e) 


Another interesting result that helps in reformulating different forms of g-inverses 
of J — P, in particular the Moore-Penrose g-inverse, was presented: For any ¢ and 


T 
u such that x’t # Oandu’e £0, As = [J — P+ stuT] eee 
5 (x7t) (ue) 
not depend on 6 4 0. 
A range of applications using different g-inverse of J — P in M. C.’s, used by a 
number of other researchers, is given in the list of references of this paper, [7]. In 


particular, Decell and Odell [1] used the Moore-Penrose g-inverse in the computation 
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of the stationary distribution of an ergodic M. C. Rising [34], also explored applica- 
tions of generalized inverses to M. C.’s, focusing on the derivation of the stationary 
distribution and mean first passage times. 


Hunter [8] Parametric Forms for Generalized Inverses of Markovian Kernels 
& their Applications, Linear Algebra & its Applications, 127, 71-84. 


Since g-inverses can appear in different forms, in this paper parametric forms for 
generalized inverses of the Markovian kernel J — P were derived. The main result 
follows. 
Given any g-inverse, G, of J — P, there exists unique parameters «, B and y such 
that 
G=[I-P +aB7] +yen’. 


Let A =1—(I1 —P)G, B=1—G(I — P), thena = Ae, B’ =x" B, y+1= 
B' Ga. 


Note also that y + 1 = a"’Ga = B’Ge, n'a =1, B’e=1. 
These parameters uniquely determine different classes of g-inverses of J — P : 


GeAll,233oy=-1, 
GeA{l,3} Sa=an/n' a, 
Ge A{1,4} o> B =e/e’e, 
GeA{l1,5} Sa=e, B=n. 


Thus, G is a Moore-Penrose g-inverse if a = e, B = e/ete and y = —1, andG 
is a Group inverse if aw = e, 8B = a andy = —1. 

The early material of the author, [2, 3], utilizing generalized inverses and applying 
them to solving for stationary distributions and mean first passage times in M. C.’s 
is also included in his two books, [4, 5], and the e-book [19]. 


Professor Hunter has published a number of research papers that utilize the results 
of the four papers described above—all utilizing generalized matrix inverses and 
extending their applications in a number of applied probability areas. We summarize 
the significant results but don’t go into full details, and the readers are referred to the 
individual papers for more information. 


Hunter [6] Stationary distributions of perturbed Markov chains, Linear Algebra 
& its Applications, 82, 201-214. 


Techniques for updating the stationary distribution of a finite irreducible M. C., 
following a rank one perturbation of its transition matrix were discussed in this 
paper. 

Let P® = P“) + ab", with be = 0, be the transition matrix of the perturbed 
P transition matrix. Choose u and t (withu’e £0, at 40). 
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Leta? =u? [7 — PY + tu™) and B’ = b7 [J — P + tu). 
Then 1? = and OT — Cs ee : 
ate (wa) B'e + (1— B’a)ate 
A variety of situations where such perturbations arise are presented together with 
suitable procedures for deriving the stationary distributions. 
Some subsidiary results relating the fundamental matrices of the two transition 
matrices are also derived, using obvious notation, 


(x) a) a zi) [ 4 ab’ Z) 


(2) _ Si Sea 
= [ 1—b' Za 1—b’ Za 


Hunter [9] The computation of stationary distributions of Markov chains 
through perturbations, Journal Applied Mathematics & Stochastic Analysis, 4 
(1), 29-46. 


An algorithmic procedure for the determination of the stationary distribution of a 
finite, m—state, irreducible M. C., that does not require the traditional use of methods 
for solving linear equations, is given in this paper. The construction of the proof 
revolves around the results derived for perturbed M. C.’s. 

The basic idea is simple. Start with the trivial doubly stochastic matrix Pp with 
each element having the value 1/m. The stationary probability vector for this M. 
C. is simply #7? = e” /m. Now successively perturb this transition matrix Py by 
successively replacing, fori = 1,2,...,m, its i-th row by the i-th row of the given 
transition matrix so that at the m-th stage the resulting probability vector is the 
required stationary probability vector. The procedure can be expressed as follows. 


Let Ag = J, uj =e? /m. Fori = 1,2,...,m, let 
T T 

of aed T_T 

u; =U;_,+ p; —e' /m, 


Aj = Aj-1 + Aj-1(ei-1 — €;)(uj_, Ai-1/uj_, Ai-1€:). 
Ati=™m, ni (= a) — Ul Am/U) Ame. 
The paper includes some modifications and provides as an example an illustration 


of the procedure. 


Hunter [10] Stationary distributions and mean first passage times in Markov 
chains using generalized inverses, Asia-Pacific Journal of Operational Research 
(2), 145-153. 


In this paper, we consider the joint derivation of stationary distributions and mean 
first passage times of M. C.’s using g-inverses of J — P (with some minor errors in 
the initial derivation, [10], corrected). 


(1) Compute G = [gi]. a g-inverse of J — P. 

(2) Compute sequentially rows 1,2,...,r(< m) of A=I—(U— P)G=[a,jl, 
until a aj # 0, (r = 1) or =i air Si ager aes a,-1j = 
0, 1 ary #0, (r A 1), where ajj = ij — Bij + Dopey Pik Bkj- 

(3) Compute the stationary probabilities 2; = a,j/ 77, rk, j = 1,2,...,m. 
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(4) Set mjj = Vyh1 Gre/arj, (= 1/mj), f= 1,2,-...m. 
(ij ~ 8ij) (Devas are) 


(5) Compute, for i Aj, i,j =1,2,...,m, mij = 


arj 
m 
k=1 (git _ Sik) . 
In general, not all the elements of A need to be obtained and, in most cases, only 
the elements of the first row aj}, 412, ..., 1m suffice. For such a scenario, all the 


mean first passage times can be expressed in terms of m? elements of the g-inverse 
G = [g;;] and the m elements {a,,}(k = 1, 2,..., m) of the first row of A. 


Hunter [11] Stationary distributions and mean first passage times of perturbed 
Markov chains, Linear Algebra & its Applications, 410, 217-243. 


Continuing in the theme of the earlier results of perturbed irreducible M. C.’s, it is 
shown that the mean first passage times play an important role in determining the 
differences between the stationary probabilities in the perturbed and unperturbed 
situations. The derivation of the interconnection, under the updating procedure, is 
explored through the use of generalized matrix inverses. 

Let P = [p;;] be the transition matrix of a finite irreducible, m-state M. C. Let 


P=[ pif] = P + E be the transition matrix of the perturbed M. C. where E = [¢;;] 
ar 

is the matrix of perturbations. Let x7 and be the associated stationary probability 

vectors. Define TI = ex’ and let G be any g-inverse of J — P. If H = G(UJ — TN) 


~T ~T 
then, for any general perturbation E,x —2a’ =a EH. 
The paper also provides a re-expression of the above result in terms of the mean 


~T ~T 
first passage time matrix, M as follows:x —a’ =x E(M —My,)(M,)"!. 
aT 
Further, if N = [ni] = [dl = 3;;)mij/mj;| = [d = dij )mij7;| ; thenax — a? 
at 
=—-a EN. 
New improved bounds for the relative and absolute differences between the sta- 


tionary distributions under general perturbations were derived. 
In particular, for each fixed j, 1 < j <m, 
m 


~ IE | loo 
Wj —Wj)< max{n;;}, where||E = max ) Eki |. 
| J jl = 5) wa ij} l| loo eae = | x | 


Some interesting qualitative results regarding the nature of the relative and absolute 
changes to the stationary probabilities are also obtained. Similar procedures are used 
to establish an updating procedure for mean first passage times under perturbations. 


Hunter [12] Mixing times with applications to perturbed Markov chains, Linear 
Algebra & its Applications, 417, 108-123. 


In this paper, a measure of the “mixing time” or “time to stationarity” in a finite 
irreducible discrete time M. C. is introduced. For this paper, and the sequel [14], 
the “time to mixing” in a M. C. {X,,} can be defined as follows. First, sample from 
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the stationary distribution {z;} to determine a value of the random variable Y, say 
Y = j, the state thus sampled. Then observe the M. C., starting at a given state i. Then 
“mixing occurs at time T = n” when X,, = j for the first such n > 1, i.e. conditional 
upon Y = j, T = T;;, is the first passage time from state i to state j (or return to 
state 7 wheni = j). The finite state space restriction, under irreducibility conditions, 
ensures the finiteness of the “mixing time 7” (a.s), with a finite expectation. 

The statistic n; = iat mj; 7; is shown to be independent of the state 7 that the 
chain starts in (i.e. 7; = 7 for all 7), is minimal in the case of a periodic chain yet can 
be arbitrarily large in a variety of situations. 

Although not identified in this paper, 7 is now known as Kemeny’s constant, (see 
[20]), which was indirectly established by Kemeny and Snell, Theorem 4.4.10 in 
[23]. 

It is shown that 7 can be derived from any g-inverse of J — P, G = [g;;] and the 
stationary probabilities {7 ;}. In particular, 7 = 1 + Via (8 ij ~ &j;). AS a special 
case, 7 = tr(Z), where Z is the fundamental matrix of the M. C. 


1 
For any irreducible m-state M. C., 7 > Ss ' 


For all irreducible m—state M. C.’s undergoing a general perturbation E = [¢;;], 
with associated stationary probability vectors x“ and 2, it was shown that 


[e — eI, < DIE lloo 


where the |—norm ||v||; of a vector v is the absolute entry sum. 
A variety of special cases are considered. 


Hunter [13] Simple procedures for finding mean first passage times in Markov 
chains, Asia-Pacific Journal of Operational Research, 24 (6), 813-829. 


This paper provides elegant new results for finding mean first passage times based 
upon using a selection of simple g-inverses of J — P starting from the basic form of 
a g-inverse as [7 —-P+ tuT} and then choosing ¢ and w, or the parameters «, B 
and y. 

We explore a variety of row and column properties of these g-inverses and explore 
the joint computation of both the stationary distributions and the mean first passage 
times. A special result deserves highlighting. 

IfG., = [I — P + ee; | =[gij] wheree} = (0,0,...,1,0,..., 0) where 1 isin the 
1/ Sbj> i= J ; 
(8 jj — 8ij)/80j, 1 F J. 
Hunter [14] Variance of first passage times and applications to mixing times in 
Markov chains, Linear Algebra & its Applications, 429, 1135-1162. 


b-th position, then 7; = gpj, j = 1,2,...,m,andmjj = | 


In this paper, we explore the variance of the mixing time. We omit many of the 
technicalities of establishing expressions but a brief synopsis is as follows. 

Define n® =E [7* [Xo =i | as the k-th moment of the mixing time, starting in 
state i, and let m\, be the k-th moment of the first passage time from state i to state 


j in the M. C. For all i, n\° = 0", mi mj. 
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The mean mixing time starting in state 7, nt? =n= pee mjj7; = 7, aconstant 
(Kemeny’s constant) independent of i, the starting state. 

Now, the variance of the mixing time when the M. C. starts in state i, vj = 
var [T|Xo =i] = n® y= Vint mim; — 7’, thus we first need to explore the 
derivation of the first two moments of the first passage times. 

By deriving expressions for the matrices of the first and second moments of the 
first passage times from state i to state j in an irreducible finite M. C. with transition 
matrix P, we establish that, if G = [g;;] is any g-inverse of I — P, 


vy =2[1-G+ EGyla + [2nG — 2tr(GL) —n — 1’ Je 


where a=(a7 Ge)e — (GE)ge + e — (11G)gDe + GgDe = (1M) ge, L = GM — 

E(GI)q +I1-GH+ EGy and n= 1+ ia (Bi = Q joj) with Sje= pan Sik: 
The variances v; are shown to depend on i and an exploration of recommended 

starting states, given knowledge of the transition probabilities, is considered. 


Hunter [16] Some stochastic properties of semi-magic and magic Markov chains, 
Linear Algebra & its Applications, 433, 893-907. 


Magic squares are arrays of positive integers that have four main properties (1) equal 
row sums, (2) equal column sums, (3) the main diagonal sum and (4) the main skew- 
diagonal, all equal to the same sum, the “magic sum”. By dividing each element in 
the array by the magic sum, we obtain the transition matrix of a M. C., a “magic” 
M. C. This transition matrix is a “super-stochastic” matrix. If we allow one or both 
of the properties (3) or (4) to be relaxed, we have a “semi-magic” M. C., still with a 
doubly stochastic transition matrix. 

The paper explored the main stochastic properties of “magic” and “semi-magic” 
M. C.’s (and their diagonal and skew-diagonal variants). A number of different types 
of g-inverses are explored—some of which have constant row sums and some are 
semi-magic. We also explored multi-condition g-inverses of the Markovian kernels 
under doubly stochasticity as well as the properties of the diagonal sums and skew- 
diagonal sums. We do not go into any details, but an interested reader can consult the 
paper, [16], which also explores the stationary distribution, the mean first passage 
times, the variances of the first passage times and the expected times to mixing. Some 
general results were developed, some conjectures were presented and some special 
cases, involving three and four states, were explored in detail. 


Hunter [17] Markov chain properties in terms of column sums of the transition 
matrix, Acta et Commentationes Universitatis Tartuensis de Mathematica. 16, 
(1), 33-51. 


In this paper, questions are posed regarding the influence that the column sums of 
the transition probabilities of a stochastic matrix (with row sums all one) have on the 
stationary distribution, the mean first passage times and the Kemeny constant of the 
associated irreducible discrete time M. C. Some new relationships, including some 
inequalities, and partial answers to the questions are given. 
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Let P = [p;;] be the transition matrix of a finite irreducible, discrete time M. C. 
{X,},n => 0, with state spaces S = {1, 2,..., m}. The stochastic nature of P implies 
that the row sums of P are all one. Let c; = 7", pi;, for all j = 1,2,...,m, be 
the respective column sums of the transition matrix. 

We pose the following questions: What influence does the sequence {c;} have 
on the stationary distribution {7 ;}? What influence does the sequence {c;} have on 
the mean first passage times {m;;}? Are there relationships connecting the {c;}, the 
{zc ;} and the {m;;}? Can we deduce bounds on the {7 ;} and the {7m;;} involving {c;}? 
What effect does the {c;} have on Kemeny’s constant K = )°”""_, 2jmj;, which is in 
fact independent of i € {1,2,..., m}. 

To explore these questions, we use a special generalized matrix inverse that had not 
previously been considered in the literature on M. C.’s. In particular, we define H = 


ni 
J 


[J —P+ ec], where e is the column vector of ones and c? = (cj, C2, ..., Cm). 
We show that Kemeny and Snell’s fundamental matrix, Z, can be expressed in terms 
of H. 

Some simple results: 


(1) cH =x" implying 2; = 0y_, cihij. 

(2) He = e/m implying hj. = Y07_) hij = 1/m. 

(3) ec’ He =1, withe’e = 0 cj =m. 

(4) hej = Oy, hij yielding h,; = 1 — (m — 1)z;. 

(5) Kemeny and Snell’s fundamental matrix Z = [J — P + n)'=A+0-NH#. 


(6) The mean first passage times, 


1 1 
— f= 
a. ee 
Nii = 
Ag ahi _ fi a hij ror 
1 j int Chay’ 


(7) Kemeny’s constant, K = 1 — (1/m) + tr(#). 


Hunter [18] Generalized inverses of Markovian kernels in terms of properties 
of the Markov chain, Linear Algebra & its Applications, 447, 38-55. 


Generalized matrix inverses (g-inverses) of J — P are typically used to solve systems 
of linear equations including a variety of the properties of the M. C. In particular, the 
{z;} and the {m;;} can be found in terms of g-inverses, either in matrix form or in 
terms of the elements of the g-inverse. What was not known was that the elements 
of every g-inverse of J — P can be expressed in terms of the stationary probabilities 
{zc ;} and the mean first passage times {7m;;} of the associated M. C. The key thrust to 
this paper is to first identify the parameters that characterize different sub-families of 
g-inverses of J — P. Then to assign to each sub-family, thus characterized in terms of 
their parameters w, $ and y, explicit expressions for the elements of the g-inverses 
in terms of the {7 ;} and the {m;;}. 
The main result of the paper is the following. 
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Let G = G(a, B, y) be any g-inverse of J — P then the elements of G = [ij] can be 
expressed in terms of the parameters {aj}, {Bj}, y, the stationary probabilities {7;}, and 
the mean first passage times {mj;;}, of the Markov chain, with 6; = ee By; as 


(1 ¥ +8) — mij + Vegi THORMIR — Dead mrad ) wi, i Fj, 
8ij = i, 
(1 y+6j4 esi TKAKM jk — Yopey hated) 1 j, i=j. 


Further, for all g-inverses G = G(a, B, y) of J — P, with 6; = hier Bmx; then 


m 


So mE dx =kK-1, 
k=1 


where K = vie 7tj;mjj, a constant (Kemeny’s constant) for alli = 1,2,...,m. 


The author has also written other papers where generalized matrix inverses have 
been utilized, and these are included in the references (in particular [15, 20—22]), but 
in these papers the g-inverses are used primarily as well-known tools for deriving 
relevant properties of Markov chains. 


2.4 Professor Hunter’s Links with Prof. Rao 


Professor Hunter has had the opportunity to meet personally with Prof. Rao and was 
involved in inviting and hosting Prof. Rao during his visit to New Zealand in 2005. 
We were delighted to have him participate in an extensive programme, especially 
onerous for someone of his age. 

The organization of his visit required considerable coordination and a number 
of individuals and organizations were involved. He started his visit to the South 
Island with a succession of seminars, firstly at the University of Otago, Dunedin 
(14 March), then the University of Canterbury, Christchurch (15 March), across to 
the North Island with seminars at Victoria University of Wellington (17 March) 
and Massey University, Palmerston North (18 March). Moving up the island, Prof. 
Nye John organized a CR Rao Workshop on Data Scrutiny and Data Mining at the 
Waikato Centre of Applied Statistics at University of Waikato, Hamilton (22 March) 
followed by a seminar at the University of Auckland (23 March). The final activity 
was a Public Lecture as the Keynote Speaker at the opening of the International 
Workshop on Matrices and Statistics (WMS 2005) at Massey University, Albany, 
Auckland (29 March). 

The NZ Statistical Association hosted him as its Visiting Lecturer. Professor 
Hunter organized him being appointed as a Distinguished Visitor to Massey Uni- 
versity as well as hosting him at IWMS 2005 and Dr. Simo Puntanen organized his 
appointment as the Nokia Lecturer at WMS 2005. 

The following year, Prof. Hunter was invited by Prof. Rao to be a Keynote speaker 
at a two-day “Workshop on Matrix Theory and in Physical, Biological and Social 
Sciences” at Penn State University, University Park, PA, USA, July 27-28, 2006. 
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The invited speakers were G. H. Golub (Stanford Univ), T. Kailath (Stanford Univ), 
J. Hunter (Massey Univ) and M. B. Rao (Univ Cincinnati). During this time, Prof. 
Hunter also had the opportunity to be hosted by Prof. Rao and his wife, Bhargavi, at 
their home at University Park. 


2.5 Final Comments 


While the significance of Prof. Rao’s contributions to a number of statistical fields 
is well known, the author is proud to have been able to apply his outstanding con- 
tributions on generalized matrix inverses and their use in solving systems of linear 
equations to a large number of problems in applied probability. He also values the 
opportunity to have met him and wishes to convey to him his congratulations and 
best wishes for attaining such a milestone birthday. 
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the Partitioned Linear Fixed Model 

and the Corresponding Mixed Model 
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3.1 Introduction 
We will consider the partitioned linear fixed effects model 

y= X18, + X.B,+e, cov(e)=V, E(e)=0, 
or shortly denoted as 


F = ty, XB, + X2By, V} = fy, XB, V}, 
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where X; and Xz are known matrices of order n x p;, respectively, p; + po = Pp, 
B, € R”,and B, € R” are vectors of unknown (but fixed) effects and B = (6), 85)’; 
here’ stands for the transpose. The covariance matrix V of the random error vector 
€ is assumed to be known. 

Consider then the linear mixed model .@ which is obtained from ¥ by replacing 
the fixed vector B, with the random effect vector u: 


M:yY=X\B,+X.u+e, cov(e)=V, E(e) =0, 


where X1, Xo, 8,, and e are as in F, uw is an unobservable vector of random effects 
with E(u) = 0, cov(u) = D, cov(e, u) = 0; V and D are assumed to be known. 
Notice that under .¥ we have cov(y) = V but under .Z, 


cov(y) = cov(Xj,u + ¢) = X,DX5+V=z. 


As for notation, r(A), A7, At, @(A), and @(A)+, denote, respectively, the rank, a 
generalized inverse, the (unique) Moore—Penrose inverse, the column space, and 
the orthogonal complement of @(A). By A we denote any matrix satisfying 
@ (A+) = @(A)+. Furthermore, we will write Py = AA* = A(A’A)~A’ to denote 
the orthogonal projector onto @ (A). The orthogonal projector onto @(A)+ is denoted 
as Q, =I, — Pa, where I, refers to the a x a identity matrix and a is the number 
of rows of A. We use the short notations 


M =I, —Px €{X"}, M; =I, — Px, € (K;}, i=1, 2. 


If we are interested in comparing the BLUEs of K, 8, under ¥ and .@, then KiB, 
has to be estimable in both models. It is well known thatK, B, is estimable under .¥ 
if and only if @(K)) C @(X{Mb), ie., Kj = LM>X; for some matrix L. In other 
words, the benefit of concentrating on estimating 0; = M>Xj,f, is that the properties 
obtained are valid for all parametric functions of the type K, 8, that are estimable 
under the model ¥. 

Putting K; = X; we immediately observe that 4, = X1f, is estimable if and 
only if @(X) C @(XMb), ie., @(X) = © (X Mb), so that 


ft, = X,B, is estimable under ¥ <— > @(X,)N G(X) = {9}. 


For Lemma |, characterizing the BLUE, see, e.g., Rao [17, p. 282]. 


Lemma 1 Consider the models F and M and denote X = X,DX‘, + V. Then the 
following statements hold: 


1. Ay is the BLUE for estimable KB = (K, : K2)B in F if and only if 
A(X, : X> : VM) = (Ky ; K, ; 0) pokes, Ae PBF : 


2. By is the BLUE for estimable K,B , under @ if and only if 
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B(X, : EM,) = (K, : 0), i.e, BE Prip ia. 


Notice that the notation “gj, refers to the set of all solutions (for A) in equation 
A(X; : X.: VM) = (K; : K2 : 9), i-e., it refers to set of matrices A such that Ay is 
the BLUE for estimable KB = (K, : K2)B in model ¥. 

It is well known, see, e.g., Rao [17], that the matrix 


G,\¢ = X(X’W-X)-x’'W*, 3.1) 


where 
W = X)X, + X2X6 4+ V=XXx'4+-V, (3.2) 


is one solution for A in equation A(X : VM) = (X: 0), and thus 
Gy\zy = BLUE(u| ¥), Gyiz € Puig, where wm = XB. 


It is not necessary to choose W as in (3.2). For example, W could be replaced with 
any matrix from the class W(V), where 


W(V) = {We R"™”: W= V4 XUX’, ©(W) = (XK: V)}. (3.3) 


In (3.3) U can be any p x p matrix as long as @(W) = @(X: V) is satisfied; see, 
e.g., Puntanen et al. [16, Sect. 12.3]. For example, U could always be chosen as I, 
and in particular, when @(X) C @(V), U could be chosen as 0. It is noteworthy that 
the matrix G,,|.z in (3.1) is unique with respect to choices of generalized inverses 
denoted as ‘“~”. This is based on the fact that for nonnull matrices A and C, the 
product AB” C is invariant for the choice of B™ if and only if 


@(C) C @(B) and @(A’) COB); 


see, e.g., Rao and Mitra [18, Lemma 2.2.4]. This invariance property is frequently 
used in this paper. Notice also that for all generalized inverses involved, we have 


X(X’W-X)X'W € Aus. 


Below are some solutions providing BLUEs for 6; = M2X,8, under ¥ and .@, 
see, e.g., Puntanen et al. [16, Chap. 10]: 


Go, |.¢ = Mo2Xi(X{MoX1) Xi{Mo_— E Paz, 
Go, .~ = MoXi(XW,, Xi) X,W; E Poin, 


m 


ie., %,)¢y and %,_~y are BLUEs for 6, under Y and -/@, respectively; above 


W,, = X:X, + © =X)X, + X.DX54+ V, 
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and the matrix M)> is defined as 
M> = M>(M,WM,>)*M_. 
Moreover, see, e.g., Puntanen et al. [16, Chap. 15], Mb satisfies 
M> = M>(M>WM>)* Mp = Mo(M2>WM2)* = (M)WM_b)~. 
Trivially VM = &M and denoting W,; = XX} + V, we have 
MDW = MoW; = MoW,,, Miw,=Mixz. 
The solutions to equations in Lemma | dealing with Y are unique if and only if 


@ (W) = R" while those dealing with .@ are unique if and only if @(W,,) = R”. 
The general solution for A in 


A(X, : Xo: VM) = (M>X, : 0: 0) 
can be expressed, e.g., as 
Ao = Go, 7 + EQw = MoXi(X{ MX) XM + EQw, 


where E € R”*” is free to vary. By the consistency of the model ¥ it is meant that 
y lies in @(W) with probability 1. Thus under the consistent model ¥ the vector 
Aoy itself is unique once y has been observed. The consistency in .@ means that y 
belongs to @ (W,,). Notice that 


€ (Wm) = @(X : XD: V) C @ (XK : X.: V) = S(W), 


with equality holding if and only if @(X2) C @(W,,) = @(X, : UM)). 
In the consistent linear model .7, the estimators Ay and By are said to be equal 
(with probability 1) if 


Ay = By forally « @(W) = @(X: V) = @(X: VM), (3.5) 
while similarly under the model .#, 
Ay = By forall y € @(W,,) = @(& : XE) = G(X, : UM)). 


In (3.5) we are dealing with the “statistical” equality of the estimators Ay and By. 
In (3.5) y refers to a vector in R", while in notation cov(Ay) we understand y as a 
random vector. For a review of the equality of two linear estimators, see, e.g., GroB 
and Trenkler [4]. 

Isotalo et al. [10] found conditions under which an arbitrary representation of the 
BLUE of 6; = M)X,8, under the fixed model .¥ remains the BLUE for 0; under 
the mixed model .#@. This kind of property can be denoted shortly as 
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{BLUE(@; | 7)} C {BLUE(@) | -)}, ie., Paz © Poa, 
where the sets Yo,,¢ and Yp,)v are defined as in Lemma |: 


Aé€ Pog — > ACK: X.: VM) = (MX; : 0:0), 
Be Poi <> B(X, : 2M)) = (MX, : 9). 


Recall that 6; generates all estimable functions in both models in the sense that each 
estimable function in both models can be represented as p’0,, where p is an arbitrary 
vector of proper dimension. Isotalo et al. [10, Sect.2] proved the following result: 


Lemma 2 The following statements hold: 


1. An arbitrary BLUE for 0; = M2X\B, under ¥ provides also the BLUE for 6, 
under the mixed model M, i.e., 


{BLUE(@, | 7)} C {BLUE(@, | -@)}, 
i.e, Pog © Poa, holds if and only if 
6(=M,) C @(X.: =M). (3.7) 


2. The reverse relation {BLUE(@, |-@)} C {BLUE(@, | ¥)}, ie, Popa S 
Pz, holds if and only if 


6 (X2) © @(Xin2 : UM), (3.8) 


where the matrix X\q2 has property @ (X\n2) = 6 (X1) N © (Xo). 


Actually, the matrix Xj, in (b) was erroneously missing in Isotalo et al. [10]. The 
inclusion (3.7) is obviously equivalent to @ (Xjqz : UM)) C @ (Kz : XM) and (3.8) 
is equivalent to @(X2 : UM) C @(Xin2 : UM)) and thereby Yp,,. 7 = Voz holds 
if and only if 

© (King : UM) = @(X2 : UM). 


For further equalities arising from models .F and -@ see Haslett et al. [7]. 
The following lemma appears useful for our considerations. 


Lemma 3 Using the earlier notation, the following statements hold: 


M =I), — P.x:x.) =In — (Px, + Puyx,) = MoQm,y, = Qu,x,M2.- 
| 1(MoX,) = 1(X,) — dim @(X1) N G(X). 

C(WiX1)* =C(WnM, : Ow.) = (LM, : Ow). 

€ (Xo: EM) = ©(MoX, : Ow). 

| S[A(A'B“)+] = G(A)NG(B). 

. ForWe WV), we have 


AnNKRWNS 
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Gyuiz = X(X'W-X)-X'Wt = Pw — VM(MVM)'M. (3.9) 


For part (b), see, e.g., Marsaglia and Styan [14, Corollary 6.2]. For (c), see, e.g., 
Rao [17, Sect. 2]. For (d), see Isotalo et al. [10, Lemma, p. 72], for (e), see Rao and 
Mitra [18, Compl. 7, p. 118], and for (a) and (f), see Puntanen et al. [16, Theorem 8, 
Proposition 15.2]. Notice from (3.9) that the actual choice of W does not matter—as 
long as We W(V). 

Our main goal in this paper is to introduce, in different situations, upper bounds 
for the Euclidean norm of the distance between the BLUEs of 0; under ¥Y and 
under 7: 

Ge, ;zy — Go, .zyYllo. (3.10) 


where ||x||2 = /x’x, when x € R”. The distance in (3.10) depends on the value of 
y and we will consider two cases: 


ye @(W) = 6(%,:X.:V), ye O(Wn) = C6(%1: XD: V). 


3.2 Upper Bounds for the BLUEs 


Let ch;(-) denote the largest eigenvalue of the matrix argument. Now the matrix 
norm corresponding to the Euclidean vector norm is defined as a spectral norm, see, 
e.g., Horn and Johnson [9, Sect. 5.6], 


TAI] = v.ch)(A’A) = ses || Ax||> 


or 
A||| = max ||Ax X}|. 
III III iIxiI Il I/II || 


Then it is easy to see that 
||Ax||2 < ||JATII - [xll2 (3.11) 


holds. Notice that for a (nonzero) nonnegative definite matrix A we have |||AT||| = 
1/X, where A is the smallest nonzero eigenvalue of A. 


Theorem 1 Consider the models ¥ and 4. Then for every y = X\a+ Xob + 
VMc € @(W) we have 


A 


Gary — Goi. YII5 < WALI IIBIIN? HICH? BIS 
—- “pb =a, 


where A = M)X,, B = (X,W,,X\)t, C= X,WiX),a =ch,(A‘A), c = ch\(C'O), 
and b is the smallest nonzero eigenvalue of X',W,X. 


The upper bound « is zero for all b € R?? if and only if a or c is zero: 
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1 a=0 => @(X1) C @(X), 
2.¢=0 => G(X) © F(EM : Oy,). 


Moreover, the following three statements are equivalent: 
(i) a =Oforallb € R”, 


(ii) Gaye W = %, 0 W, 
(iii) Yo. wy = BLUE: | F), i.e, Gouiw € Pas. 


Proof Let y = X;a+ X.b+ VMc = (X; : X2. : VM)t for some a, b and ¢ so that 
t = (a’, b’,c’)’. Then 


d=% zy — Gay 
= (G7 — G.7)(B1 : Xo: VMDt 
= [0: Gole — Go.w)X2 : O}t 
= Ga Xob 
= MX, (X, WEX)) +X) W 


m 


*Xb. 


m 


Thus for the squared norm we get 


||| = |[MoX\(X, WiX1)+X We Xo] /3 


m 


[MX 117 11K WX) 1? TEX We Xl? |b I15 


m m 


A 


= ——b/b=a, (3.12) 


where A =M)X,,B = (XWtX,)+,C =X, WX), a=ch(A‘A), c=ch 
+X. The inequality in 


(C’C), and b is the smallest nonzero eigenvalue of X{ W,, 
(3.12) follows from the consistency and multiplicativity of the matrix norm |]-||2; 
see (3.11) and Ben-Israel and Greville [3, pp. 19-20]. 

The upper bound q@ is zero for all b if and only if a or c is zero. Obviously 


a=0 = MX, =0 |> @(X)) C @(K%). 
The equality X}W;, X2 = 0 is equivalent to 


m 


6 (X2) C ©(WFX))* = GW, ML; : Qw,,) = (2M : Qw,). 
where we have used part (c) of Lemma 3. Hence 
c=0 =} (XK) C G(=ZM_ : Qy,). 


The equivalence of (i), (ii), and (iii) is clear as (iii) holds if and only if 


Gola (Xy : X> 5 VM) = (M>X; :0: 0) > 
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which further is equivalent to %,),~ X2 = 0. This completes the proof. 


Theorem 2 Consider the models ¥ and . Then for every y = Xj\a+ UM,b € 
6 (Wm) we have 


Go.2¥ — Goa YlI3 S WIAII? HIBIIP ICIP b' Mb 


A 


= — BM ib Sys 


where A = M2X,, B = (X\M2X1)t, C= X\M2XM,, and a = ch,(A’A), c = chy 
(C'C), and b is the smallest nonzero eigenvalue of X\M2X\. 
The upper bound y is zero for allb ¢ @(X;) if and only if any of the following 
equivalent conditions holds: 
1. G9 Wn = G0 Wm; 
2. X'Mo.=M, =0, 
3. €(ZM,) C G(X. : XM), 
4. {BLUE | #)} C {BLUE(@: | 4}. 


Proof If y = X;a+ =M)b for some a, b, and t = (a’, b’)’, then 
d= (G\7 — Gj.~)(% : 2M))t 


= (0 5 Go |.F =M,)t = Go \.F x=M,b 
= M)X) (X,M>)X|)*XM>=M)b. 


Hence 
Ndil5 = Gary — Gaia vIl 
< |[MoXi lll? [| CX)M2X1)" UII? [PX M>£Mi ||| b’Mib 
= |||AI|I? /IBII|? IIICI|? b’Myb 
a:c , 
= “pee b M,b =y), 
where 


A=M>X,, B= (X{M)X))t, C= X'\Mo=M,, 


and a = ch,(A’A), c =ch,(C’C), and b is the smallest nonzero eigenvalue of 
X)M>X\. 

The upper bound y is zero for all b ¢ @(X;) if and only if A = 0 or C = 0. We 
observe thatA =0 <> @(X)) C @(X2). ThusitisclearthatA =0 => C=0. 
Now C = 0 if and only if @(2M)) C @(M>X))+, which, in view of @(=Mj)) C 
€(W), is equivalent to 


@(ZMj) © @(MoX1)" NE (W) = (MX: : Qy). (3.13) 
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In light of part (d) of Lemma 3, (3.13) can be written as 

6 (ZM)) C @(X,: XM). (3.14) 
Moreover, by part (a) of Lemma 2 the inclusion (3.14) is equivalent to 


{BLUE(6; | 7)} C {BLUE(@; | 7}. 


So the proof is completed. 


Theorem 3 Let y = X;a+X b+ VMc for some a,b, and c, and define 


d=% \ zy — (G0 + EQw,y, 


where E € R"*" is an arbitrary matrix. 


1. The norm |\d\|z has no upper bound unless 
6 (X21) CO (Wy), te, X2=X,A+ UM\B (3.15) 


for some A and B. 
2. Supposing that (3.15) holds, for every y = X,a + X.b + VMc we have 


IS < IM>X1Al II? [IBII3 = @ |BI5 =: 5, 


where a = ch(A’XM>X\A). The upper bound 6 for ||d\| is zero for all b € R” 
if and only if 
6 (X2) © @(Xinag : EM1), 


where © (X\n2) = 6 (X1) NG (Xd), ie. 
{BLUE(6; | .@)} C {BLUE(@, | -F)}. 


Proof Let y = Xja+ X.b+ VMc = (X, : X) : VM)t for some a, b, and c so that 
t = (a’, b’, c’)’. Then, in view of the triangle inequality, 


Id|l2 = |[Ya.7¥ — GY.w + EQw,, yll2 


m 


Ge — (Go).a +EQw ) (X, : X, : VM) t||2 


m 


[ 
= ||[0: G.w + EQw, )X2 : 0] tll2 


m 


(G7 + EQw,,)X2bI|2 


m 


Ge\.u X2b|\2 + ||EQw,, Xob||2 


m 


[Gea Xall| lilo + (EI (Qu, X21] II l2 . (3.16) 


m 


IA IA 


Thus ||d||2 has no upper bound unless @ (X2) C @(W,,,), i-e., 
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X, = X,A+ 2M,B = X,A+ W,,M,B (3.17) 
for some A and B. Substituting (3.17) into (3.16) gives 


[Idl13 = [IYo.ry¥ — Goa + EQw, )¥II3 

= ||Go,.a X2bl|3 

= MX (X, WX) +X, WE xX, Abl|5 
= ||MoX,Ab||3 

< ||IMoX, All|? [III = a |[bI]5 =v, 


where a = ch, (A’X,M>X,A). Thus, supposing that (3.17) holds, ||d| 5 has the upper 
bound y, which is zero for all b if and only if M2X,A = 0, i.e., 


A=Qxm,Z for some Z. (3.18) 
Combining (3.17) and (3.18) and noting by part (e) of Lemma 3 that 
@(XiQym,) = (Ki) N SK), 
yields 


@(X2) © @(King : UM), where @(Xin2) = ©(KX1) 1 (X), 


which by part (b) of Lemma 2 is equivalent to Yy,. 7 C Paz. 


3.3. Upper Bounds Related to OLSE 


The ordinary least squares estimators, OLSEs, for 0; = MX, under ¥ and .Z, 
respectively, are defined as 


Ay = OLSE(@; |. ¥) — > A(K, : X.:M) = (MoX; : 0:0), 
By = OLSE(6@; |.4@) <= > BCX, : M,) = (MoX, : 0). 


Thus, denoting H; = Px,, we obviously have 


Pux,y = OLSEQ@, | F) = 6,(F), 
M>Hy = OLSE(@; | ./) = 6) (4). 


What is the Euclidean distance between the OLSEs under .@ and .¥ ? We should note 
that there is just one single matrix providing OLSE in each model, these matrices 
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being Pv,x, and M>Hy;. These matrices are unique, just like in the BLUE-case the 
matrices %,¢ and %,|_ are unique. Yet, we can try to find a condition under which 


(a) Pm,x, Wn = M2HiW,, or (b) Pm,x,W = MoH W. 
What about the upper bound for 
|Pmzx, — M2Hi)yl|2 when y € @(W,,) ? (3.21) 
In addition to the upper bound for (3.21) we will consider upper bounds for 
9. F)—O(F)llz and [1010-4) — O1-A)Ila, 


where 0; (./) = BLUE(0, | .@) and 6,(.¥) = BLUE(6, | ¥). 


Theorem 4 Consider the models ¥ and M. Then for every y= Xj\a+ ZUM,b € 
6 (Wi) we have 


||OLSE(@, | .#) — OLSE@, | ||} = ||(Punx, — MoA)yI|3 
< ||A — BUI? Mid 15 
=a ||M,b\|; =: &, 


where A = Py,x,uM,, B = MoH, xM,, anda = ch,[(A — B)(A — B)’]. The upper 
bound & is zero for allb ¢ @(X,) if and only if A = B, i.e., Py,x, Wn = MoH Wn. 


Proof When y = X,;a+ 2M b = (X; : ©M;)t, where t = (a’, b’)’, we have 


d = (Pyax, — MoHi)y 
= (Pwpx, — MoH))(X; : &M))t 
= [0: (Pm,x, — M2H))=MjJt 
= (Pyyx, — MoH) 2Mib, 


and hence 


ld\|3 = ||(Pm.x, — MoHy)yl|3 
< ||/Pw.x, 2Mi — MoH; Mj |||? ||Mib||3 
= |||A — BIII? |[Mib|/3 
=a ||Mib||5 =é, 
where A = Pyy,x, 2M), B = MoH, 2M, anda = ch,[(A — B)(A — B)’]. The upper 


bound & is zero for all b ¢ @(X;) if and only if A—B=0, ie., Pwjx,Wn = 
MH, W,,. Thus the proof is completed. 
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Remark 1 It is straightforward to confirm that Py,x, W = MoH, W if and only if 
M>H, X> =0 and M>H, =M = Pm,x, =M. 


The upper bound corresponding to Theorem 4 for every y € @(W) can be obtained 
(skipped here). 


Remark 2 We further observe that 
Pu.x,y = OLSE(6, | #) = BLUE; | ¥) 
if and only if Pw,x, (Xi : X2 : UM) = (MoX : 0: 0), ie., 
Py,x, 2M = 0. 


Similarly OLSE(@; | 4) = BLUE(@, | -Z) if and only if Py,x, 2M; = 0. Thus we 
can conclude that 


O1(F) =0\(M) = OF) =6\(F). 


Remark 3 Suppose that @(X2) C @(W,,,), so that @(W,,) = @(W). Then @ (Mp 
X\) C @(W,,), and thereby 


(Pyox, — M2H))Qw, = 90. 


Hence, if @(X2) C @(W,,,), the statement 
(Pu.x, = MoH) )W,, =0 
is actually equivalent to (Pw@,x, — MoH,)y = 0 for all y € R’, so that Py,x, = 
MDH, . 
Theorem 5 Consider the model @. Then for every y = X\a + UM)b we have 
[181(-@) — 61(-@)I|I3 = ||OLSE@, | @) — BLUE, | 7)II3 


< |||A|||? b'Mib 
=> ab'M,b — @, 


where a = ch,(A’A),A = MoH, xM\. The upper bound @ is zero for all b¢ @(X) 
if and only if M>H, XM, = 0, i.e. 0\(@) = 0,(-@). 


Proof Applying part (f) of Lemma 3 we have 


X(X, W>X))~X, Wt = Pw, — £M,(M,=M,)*M1. (3.22) 


m 


Premultiplying (3.22) by M.H, we obtain 
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Go. wY = MoHiy — MoH) ZMi(M; 2M) My 
= 6\(.@) — MoH, EM (M, EM))*Miy 
=60)(M). 


Thus we have 


6:(.@) — 6\(.@) = OLSE(@, |.@) — BLUE(6; |.) 
— M>H; 2M, (M, =M,)*Miy ’ 


and for all y = X,;a+ =Mjb, 


11(-@) — 01. @)I|I5 = ||OLSE@; |.) — BLUE@, | )\I3 
= ||M2H, 2M; (M; £M,)*M; ©M,bI|5 
< ||M>H, 2Mjb)|5 
< ||[M2H; 2Mj|||* b’M)b 
=ab/M,b=¢, 


where a = ch; (A’A), A = M>H, 2M). The upper bound ¢ is zero for allb ¢ @(X1) 
if and only if M2H; 2M, = 0, i-e., 0;(H) = 0,(-4). 


Theorem 6 Consider the model # . Then for every y = Xa + VMb, 


(161 (F) — 6:(F)I5 = ||OLSE@, | F) — BLUE@, | F)|I3 
< |||BI||° b'Mb 
= bb'Mb =: 0, 


where b = ch, (B’B), B = Py,x, 2M. The upper bound @ is zero for allb ¢ © (X) if 
and only if Py,x, uM = 0, i.e. 0\(F) = 01\(F). 


Proof It is clear that premultiplying the fundamental BLUE-equation 
X(X’W-X)" X’ W(X, : X) : EM) = (X : 0: 0) 
by Mbp yields 
M>X(X’W_X)~X'W* (X, : X. : EM) = (MoX; : 0:0), 
and so Cy = BLUE(@, | ¥), where C = M)>X(X’W_ X)-X’W". Moreover, it can 
be shown that 


Gp, = MyX(X'W~X)~X'W*. (3.23) 


By part (f) of Lemma 3 we have 
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X(X’WX)"X'Wt = Pw — EM(MEM)"M. (3.24) 
Premultiplying (3.24) by M>Px and using (3.23) gives 


Gy \g¢ = M2X(X'W X) X’Wr 
= M.[Px — Px EM(M=M)TM] 
= My, [Px — Px 2M(MEM)tM] 
= MD[(Px, + Pm.x,) — Px EM(MZM)*M] 
= Pw,x, — Pm,x, EM(M=M)‘M. 


Thus 


G\¢Y = Pm,x,y — Pm,x, 2M(M=M)*My 
= 6(F) — Pm,x, EM(MEM)*My, 


and 


6:(.F) —0)(F) = OLSE(, | .¥) — BLUE@ | .F) 
= Py,x, EM(MEM)*My. 


Now for all y = Xa + VMb = Xa + XMb, 


(161 (F) — 6\(F)|I5 = ||OLSE@, | F) — BLUE@, | F)II3 
= ||Pm,x, 2M(M=M)*M=Mb\|5 
< ||Py.x, ZMb|/3 
< |I[Pu.x, ZM|||? b’Mb 
=bb/Mb=o, 


where b = ch, (B’B), B = Pw,x, 2M. The upper bound g@ is zero for all b ¢ @(X) 
if and only if Py,.x, UM = 0, ie., 01(F) = 01(F). 


Suppose that 
OLSE(6; | #) = OLSE(@, | F) for ally € @(W,,) 


holds, i.e., 
Pypx, (X1 : 2M)) = MoH) (X) : &M)), 


which, in view of Px, X1 = MoH) X; = MoX,, is equivalent to 


Pyx, 2M; = M>H, EM; =: S. (3.25) 
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We may now ask under which condition does the following hold: 

BLUE(@); | -@) = BLUE(6, | ¥) forally ¢ @(W,,). 
We know that 


Go FY = Puax,Y = Py.x, =M(MEM) ‘My, 
Gy \.aY = MoHiy — MoH) £M;(M, 2M) Miy. 


Thereby, assuming that (3.25) holds, BLUE(6, | .@) = BLUE(6, | ¥) forall y = 
X,a+ 2M)b if and only if 


Pw.x, 2M(M=ZM)* My = MoH, =M;(M, =M;)* Muy, 


Pv.x, 2M(MZM)*M=M)b = M>H, =M)b for all b € R’, 
which by M = M,Qy,,x, and (3.25) can be expressed as 
Pyyx, 2M) Qu,x,(M=ZM)*M=M, = M>H| EM). (3.27) 


On the basis of (3.25) and (3.27), we can conclude the following result. 


Corollary 1 Consider the models Y and 4 and assume that 6, (MH) = OF ) for 
ally € @(W,,), i.@., (3.25) holds. Then for ally € @ (Wm), 


1 O(M)=0(.M) — O(F) = OF). 
2. 6:(M) =O\(F) <> SOy,x,(MEM)*MEM, =S. 


3.4 Conclusions 


In this paper, we consider the partitioned fixed linear model ¥ : y = XB, + X2B.+ 
e and the corresponding mixed model .@: y = X,B8, + X.u + ¢, where e is random 
error vector and wu is a random effect vector, in other words, .4 is obtained from 
F by replacing the fixed effects B, with the random effects u. In both models the 
covariance matrix of ¢ is V while the covariance matrix of u is D. Isotalo et al. 
[10] found conditions under which an arbitrary representation of the BLUE of an 
estimable parametric function of B, in the fixed model .¥ remains BLUE in the 
mixed model .@. The considerations were done by studying the properties of the 
BLUE of 0; = MoXB,, where My = I, — Px,. 

In this paper, we establish upper bounds for the Euclidean norm of the difference 
between the BLUEs of 6; = M»X), under models ¥ and .@. Some corresponding 
bounds are considered also for the difference between the ordinary least squares 
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estimators, OLSEs of 6. It is a bit tricky to consider the distance between the BLUEs 
under ¥ and .@ because under ¥ the response vector y is supposed to belong 
to @(W) = @(X: V) while in M it belongs to @(W,,) = @(X, : UM)), where 
x = V+ X)DX;. We pick up particular representations for the BLUEs, %,).7 y and 
Gp». wy, and study the upper bounds for the norms of their Euclidean distances when 
y belongs to @ (W,,) or @ (W). 

As for related literature, Baksalary and Kala [2, p. 680] provided an upper bound 
for the Euclidean distance of the OLSE and the BLUE of x = XB under {y, XB, V}; 
see also Haberman [5], Baksalary and Kala [1], and Trenkler [19, p. 261]. The 
corresponding considerations for the BLUEs under two linear models fy, XB, Vi} 
and {y, XB, V2}, have been made by Hauke et al. [8]. Upper bounds arising from 
linear models with new future observations have been considered by Haslett et al. 
[6, Sect.3], Markiewicz and Puntanen [12, 13]. See also Kala et al. [11, Sect. 6], 
who considered the distance between the BLUEs under the original and the trans- 
formed models, and Pordzik [15], who studied the distance between restricted and 
unrestricted estimators in the general linear model. 

In practical applications, the paper provides guidance on the effect on the remain- 
ing fixed parameters of treating some of the parameters in a linear model as fixed 
rather than random, or vice versa. When computing estimates using statistical soft- 
ware, this is usually a very simple programming change with unknown consequences, 
so it is rather too easy to misspecify the correct model. Our paper provides useful 
guidance and upper bounds on the consequences of such misspecification. 


Acknowledgements Thanks to the anonymous referee for constructive comments. 
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4.1 TU Cooperative Game 


A transferable utility cooperative game is defined by 


(i) Player set N = {1,2,...,n}. 
(ii) Coalitions: S Cc N. 
(iii) Worth of a coalition § = v(S) € R. 
(iv) Worth of the empty coalition = v(¢) = 0. 


A preimputation is any real vector x = (x1, X2,...,Xn) with x(S) = Douek x; and 
x(N) = v(N). 
An imputation is any real vector x = (x1, X2,...,%X,) withx; > v({i}), i= 1,2,..,.2 


and x(N) = v(N). 
The satisfaction to a coalition S with imputation x is given by f(x, S) = x(S) — 
v(S). 


Example 1 


N = {1, 2,3}, v({1}) = v({2}) = v({3}) = 0, vd, 2) =9, and vd, 3) = 12, 
v(2, 3) = 15, vd, 2, 3) = 20. The satisfaction for imputation x = (7,9, 4) and 
y = (2,8, 10) at S={2, 3} are respectively f(x, S) = (9+4) — 15 = —2 and 
fo, S)=8+ 10-15 =3. 
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We can write down the satisfactions as a vector in 2° space. For our example 
above, we have the following satisfactions for each imputations. 


Coalitions x(S) v(S) F(x, S) = x(S) — v(S) y(S) fi, §S) 
0 
7 

2 9 0 9 8 8 

3 4 0 4 10 10 

12 16 9 7 10 1 

13 11 12 -1 12 0 

23 13 15 —2 18 3 

123 20 20 0 20 0 


One can rearrange the satisfactions in an increasing order for each imputation. Let 0 
denote such an arrangement; we have 


O(x) = (—2, -1,0,0,4,7,7,9) and @(y) = (0, 0, 0, 1, 2,3, 8, 10). 


We say y x (imputation y is lexicographically better than x when 6(y) a O(x)). 
Here, 6(y) = 6(x). Thus, x $y if and only if 6(x) < @(y) lexicographically. 


Let X be either the set of preimputations or imputations. Then 


Definition 4.1 (Schmeidler [25]) The nucleolus of the TU game (v, NV) over the set 
Xisv(X)={xexX; ify eX then O(y) < O(x)}. 
L 


For X one may also consider other sets such as {x : ee x; = v(T)), Vier, = 
v(Th),.. Dien x; = v(Tx): {T1, To, ..T,} a partition of {1,2,...,n} = vt}. 


Theorem 4.1 Jf X is non-empty and compact, then v(X) is also non-empty and 
compact. 


Proof Let X’ = {x : x € X,0:(y) < 6\(x) Vy € X}. Since 6; is a continuous func- 
tion of x, the set X’ is non-empty and compact as a continuous image of X. By 
induction, X = {x:x eX") O&(y) < &(x)Vy € X“—)} is continuous and 
compact fork = 1,2,...,2”. 
We claim v(X) = X°"). Suppose v(X) 4 X°"). This means we have a y € 
X — X°, x € X°" such that 6(y) > 6(x). Let k be the smallest integer such that 
L 


y¢ X Then 6, (y) = 0(x) if l < k. However, @(y) - 6(x) and this contradiction 
shows v(X) = X@"), 
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The interesting aspect of v(X) is the following: 
Theorem 4.2 If X is the imputation set, then v(X) is a unique imputation. 
The following lemma is needed for the theorem. 


Lemma 4.1 Let 6(x) = O(y), x # y, then for any 0 < 4 < 1 @ is quasi-concave, 
ie. O(Ax + (1 — Ay)) = O(x). 


Proof Let z= Ax + (1 —A)y. We have 


A(x) = (f(x, Si), F(X, S2),.--, FX, Son)) 
A(y) — (fy, T1), SO, Th), tees SO, Ton)). 


Here, S), Sz, ..., Sg area permutation of all possible coalitions in N, with f(x, S1) < 
F(x, So)-++ < f(x, Son) and similarly 7,, 75,..., To is yet another permutation 
of all possible coalitions in N with f(y, 71) < f(y, Ty)--- < fy, Ta). We will 
assume that in case f(x, S,) = f(x, Sk41), their naming satisfies f(y, S,) < 


f(y, Seti): 

Sincex # y wehave }°.5 x; # )-;¢5 yi forsame coalition S andhence f(x, S) 
f(y, S). Let S = S, be the index of S§ according to 6(x) and we have f(x, S.) 4 
f(y, S.). We can find such an S, with c as the first index with this property. 

Let f(x, S.) = q. For any coalition S, and z= Ax+(1—A)y 


FZ, S) = Ax(S) + 1 — A)y(S) — v(S) 
= Af (x, S)+  — A) fy, S). 


For any k <c we have f(z, S,) = f(x, S,). Consider any k > c. We claim that 
fy, &) = f(@, S-). Suppose not. Then f(y, S,) < f(x, S,). In fact if f(x, S$) < 
f(x, S-) =q, then S = S; and k <c and by assumption f(y, S) = f(y, S) = 
F(x, Se) <q, while in particular f(y, S.) > f(x, S-) = q. Thus for k > c either 
F(x, Si) > g,and f(y, S) = gor f(x, Sx) = qand f(y, S,) > g. Thus f(z, Sx) > 
q. We have f(z, Sx) = f(x, Sx) fork < cand f(z, Sp) > g: 


ie. &(z)=&(x) forl<k<c-1 
6.(Z) > O-(x) i.e. O(z) > O(x). 


When x £ y and0O <A < 1 forz=Ax+ 1 —A)y, A(z) > @(x). This proves that 


v(x) is a unique point in X. 


Some special cases where one can explicitly express the nucleolus as a fixed 
form of the data defining the game are worth investigating. We will start with simple 
examples. 
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Example 2 


Let v be a 3-person simple majority game. We have 
X= (Gaya) Dea ly me S04 = 1,23). 


We will check x* = (, i +) is the nucleolus. 

Let y = (1, yo, ---; Yn) # X* be any imputation. W.l.g. y; = : +é€1, w= : + 
2, 3= i — €; — €) where €), €2 > 0. Similarly, let z; = i -—dj a= ; —-h 3= 
i + 6; + 42. In either case the least entry in 0(y) or 0(z) is < i. Thus (, i, 5) is 
the nucleolus. 


Example 3 [7] 


A landlord who is handicapped owns all the lands in the village. He can hire n- 
(homogeneous) landless laborers and harvest f(n) bags of rice. Suppose f (x) is an 
increasing concave function. Then the TU game that we can define is 


5) = [£US1= Dif $ > landlord 
VU = 
0 if § % landlord. 


Find the nucleolus for this game with n + | players 


Lemma 4.2 The game has a non-empty core, where the core of a game (v, N) is 
defined as 


{x mead) Vi ee vy n {x : Sox = (8) Sc v}. 
ieS 
(In general core could be empty.) 


Proof Consider the preimputation 


fi) — fm—1) 
5 ; 


o(wj) = =1,2,...,8 


si = FH) nf =F r= |. 


2 
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Fig. 4.1 Bags of rice vs 
number of laborers 


f(n} 


We will check that it is an imputation. Since fan fun) > 0, itis enough to check that 


#(L) = f(n)— n| orpend | > 0. We will actually show that #(L) = f(n) — 


n A fae) > fm for) > 0. That is, landlord gets at least what laborers are 
offered. Since f (0) = 0 and f (7) is concave, we have from Fig. 4.1, 
fa-V) . fm. f@ 
n-1l1 ~ n ~n+l1 
ie (nt Df —)>M—-Df) =n +1) — 21 f(a). 


Thus 2f(n) > @+Dif@—- f@-— I. 
ie. f(n) > (n+ p| ae=fe=2 | — fin) n| Lorgend | > LO far) 
> 0. 


Remark 4.1 What happens when workers are highly productive and the output is 
an increasing convex function? In this case, one can show that the landlord would 
check whether what remains after the payment of half the last workers’ marginal 
contribution is feasible leaving him at least that level of rent comparable to the wage 
of the workers. When it works he will take it. Otherwise, the nucleolus rent and wage 
will be f a , namely possession of land has the same value as the worker’s wage. 

One can also analyze the case where there are two lame brothers who own all the 
land in the village and are trying to compete with each other for the n homogeneous 
landless laborers. When the production function f(x) is concave, they basically have 
to compete for sharing the workers. Since the marginal contribution is a decreas- 
ing function when f is concave, the sharing of workers at the nucleolus price will 
increase the wage close to Loma fin) where 2m = n(say) where the landowners 
employ equal amount of workers. For finer details locating the nucleolus, see Chetty 
et al. [7]. 


50 T. E. S. Raghavan 


4.2 Nucleolus for the Talmud Game [2] 


An orthodox Jew borrowed $1000, $2000 and $3000 from 3 banks and spent $5000 
for personal use and died suddenly. His orthodox wife finds just $1000 in his wallet. 
The 3 banks want to get the entire $1000 for themselves. She seeks the advice of her 
lawyer who is also committed to Jewish law. According to him, the problem has to 
be resolved as enunciated in the Talmud. The Talmud (Baba Metzia) talks about the 
following example. 

Contested Garment Principle: Two sisters show in court (Jewish), the wills. Elder 
one says that she is entitled to her mom’s marriage cloth (imagine it as an expensive 
silk saree from Kancheepuram). The younger sister shows her mother’s will that the 
two sisters share the saree equally. 

The underlying Jewish principle is that no one can fight for what one is not entitled 
to. The judge says that the second daughter is not entitled to more than ; according 
to the will. Thus, the judge says that the elder daughter gets at least 5 the saree, and 
the debate is only over the other half on which both daughters’ wills say they are the 
owners. Thus, the judge says a 50: 50 split of the other half is the most unbiased, 
giving 3th the saree to elder daughter and +th the saree to the younger one. 

Using this principle, the judge wants to resolve the bank loan problem. Think of 
the banks as players 1, 2, 3 and let v($) be what would remain in case each coalition 
S is entitled to max(1000 — Vig 5 dj, 0). That is each coalition of bankers can fight 
over what would remain after closing the debts outside the coalition. In this case, 
v(1) = max(1000 — (2000 + 3000), 0) = 0. Then v(1) = v(2) = v(3) = v1 2) = 
v(1 3) = v2 3) = 0, v1. 23+) = 1000. 

In this case, the judge suggested equal division of $1000 among the 3 banks. 
Incidentally, this is also the nucleolus for this game v with N = {1, 2, 3}. 

Suppose the man who borrowed money from the 3 banks spent $4000 leaving 
$2000 in his wallet. We can write down 


v(1) = max{2000 — dy — d3,0} = 0 where d,; = 1000 dz = 2000 d3 = 3000. 
v(2) = max{2000 — d; — d3, 0} = 0 
v(3) = max{2000 — d, — do, 0} = 0 

v(1 2) = 0, vil 3) = 0, v(2 3) = 1000, vd 2 3) = 2000. 


How could we use contested garment principle to resolve the problem with three 
persons? 

Let us revert to a two-creditor case. Suppose EF is the money in the wallet of the 
diseased person and suppose d, d are the loans from the two banks. The amount each 
bank i concedes to the other bank j is max{E — d;, 0}. Let 6* = max{E — d;, O}}. 
Thus, bank 1 concedes to bank 2, an amount oy . Bank 2 concedes to bank | an 
amount 0;'. The amount at issue is E — 6," — 6;°. It is shared equally between the 2 
bankers and each bank receives in addition what the opponent bank has conceded. 
Thus, the amount is 
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+ + + + 
x, = zeae “ & +025 2x2 = Beale “ +65 
for banks | and 2, respectively. 
We will say that this division of E for claims d,, dz is prescribed by the contested 
garment principle. 
If we view this solution as a function of E, for small values of E, he , oy will be 
O and x, =x». = £ Let d, < dz and suppose d; < E < dz 


6} =E-—d, and 6f =0. 


Thus 
_E-(E-d)-0_ 4 
as 2 15 
Ba 245 =0 

x2= + (E d\) 
2 

agp Sah 

=(E-a)+5 

= = 


Thus, bank | gets 5, the amount the bank lent and the other bank (bank 2) gets the 
remaining. 

Let us continue to analyze when d, < E, dz < E butd; +d) > E. 

We get 0° = (E —d,). 0/ = (E — dy). But what remains after conceding by 
each is E — 6;* — 6}. This is split evenly by the 2 banks and in addition, each bank 
receives what the opponent has conceded. Consider the following case. 

Let d; = 1000, dz = 2000, E = 2400 say. The CG (Contested garment princi- 
ple) will offer 


__ 2400 — 1400 — 400 


xy= 5 + 400 = 700 
2400 — 1400 — 400 
x2 = 5) + 1400 = 1700. 


(Here ar = max(2400 — 1000, 0), 6s = max(2400 — 2000, 0)). 

Still, this does not resolve the issue for the 3 person case. Our next step is to search 
for solutions that coincide with the CG principle for each pair for any division of E. 

Let E < d,+...d, (n banks have lent) d; > d) > ...d, > 0 and the estate E 
has no money to pay in full, the money borrowed. 

We call a solution (x1, x2, ...X,) pairwise consistent if for alli 4 j with just the 
banks i, j that have lent d; and d;, the solution says x;, x; as though the total estate 
was only x; + x;. We call the solution consistent if for each pair i # j the amounts 
X;, x; correspond to CG division of x; + x;. 


52 T. E. S. Raghavan 


A key observation is that for fixed d;, d;, the two awards are monotonic in E. 


Theorem 4.3 Each bankruptcy problem with n lenders lending d,,dz...dy to a 
person whose estate is worth E < )~\ dj has a unique consistent solution. 


Proof We first show there cannot be two distinct consistent solutions: Suppose x and 
y are two consistent solutions and x ¢ y. Wecan find two lenders i, j receiving x;, x; 
according to x and y;, y; according to y. Suppose x; + x; < yj + yj yet xi < y; 
and x; > y;. The CG principle awards y; to j if E = y; + y; and x; when it is 
E=x;+x;. Buty + y; = x; + x;,and the monotonicity of CG principle demands 
yj; = xj, a contradiction to y; < x;. 

We will now show how we can build a consistent solution. We assume EF is 
slowly growing for fixed loans d,, dz, ...d,. For small E the lenders share equally. 
Suppose d; < d2 <--- <d, then at some stage everyone would have received a, 
Then temporarily the lender d, receives no more payments. The remaining amount 
E-n a is now being distributed equally till the 2nd lender gets 2 Now both lender 
1 and lender 2 are stopped receiving any more. We continue this procedure till each 
lender i has received 4 i = 1,2,...,n. This will happen when dite =E, 

Incase E > 2 — 5 (Total debt), the process is to think of how much each lender 
has to lose. If d) — x1, dz — x2, ...dy — Xn is what they lose, then when (D — E) is 
small, it is shared equally. Thus, in this case the lender i with d; receives d; = @-*) 
When it hits say a for the first time, lender 1 is not allowed to lose any more than 5 
of what he lent. The procedure continues with the rest of the lenders 2, 3, ..., n losing 
the same amount till it hits 2 Now his losses are distributed among 3, 4...,, the 
same way, etc. We have chosen to dig the tunnel from the other end and have met in 
the middle. 


Example 4 


Let d; = 1000 d, = 2000 d; = 3000, E =4200, D> E> 2. Thus, we choose 
to find awards that fall short of what one has lent. 

Inround 1, say 500, 500, 500 is the amount all lose. Now it has hit a = 500. Thus, 
lender 1 has been met with | the amount borrower took from lender 1. The estate 


2 
is now left with E’ = 2700. Now d, = 2000 — 500 = 1500 ad = 3000 — 500 = 


2500 E’ > Ore Thus, suppose 500 more is paid to 2and 3 we have got d, — x; = 
500, di — x5 = 1500 — x5, ds — x, = 2500 — x}. If x5 = 150 x4 = 150, then the 
losses are equally split between banks 2 and 3 and yet each one gets more than 5 
their lent amount. Thus x* = (500, 1350, 2350). 


Remark 4.2 When the estate E < f(= 5 the total loan), the individual loaners 
feel happy if they could receive at least half of what they loaned. When E' > 2. the 
loaners aim to get as much as they can and at least would continue till the actual 
losses are no more than the actual losses of loaners who loaned more. 
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The consistent rule we used has two properties. 


Definition 4.2 A solution x to (E,d) withd = (d,...d,) x = (x,...X,) is order 
preserving if 0 < x1 <x2--- <x, and0 < d) — x1 < dy —22---<d,—%X. 


Thus, people with a higher claim than another get an award that is no smaller than 
and at the same time suffer a loss that is no smaller. 

Depending on E and d, we can treat the n person problem in one of the following 
three ways: 


(i) Divide E between loaner 1 and the (n — 1) persons {2,3,...,} as a coali- 
tion using CG principle where the coalition {2, ...n} divides the amount by 
induction. 


(ii) Assign equal award to all loaners. 
(iii) Award equal losses to all loaners. 


Specifically, (i) is used whenever it yields an order preserving result. This happens 
when n 4 <E<D-—- "di We apply (ii) when E < a and (11) when E > D — ne 
Hered =(Gisccsd,) sD = ds: 


Example 5 


Banker 1, 2,...5 lent 100, 200, 300, 400, and 500 to a person with E = 510. 
Suppose the coalition {2,3, 4,5} forms. The CG rule offers 50 to | and decides 
the remaining amount 460 to be split among {2, 3, 4, 5} with a joint claim of 1400. 
Suppose this splits into {2} and {3, 4, 5}. Again applying CG rule gives 100 to {2} 
and decides to split 360 among {3, 4, 5} with a joint claim of 1200. Now any further 
split of this coalition as {3} and {4, 5} applying CG rule will offer 150 to {3} and 
have to share 210 between loaners 4 and 5 and at least one will get less than 150 
violating order preservation. Thus, they divide equally giving the final solution as 
(50, 100, 120, 120 120), which is order preserving. In fact we have the following. 


Theorem 4.4 The above coalitional procedure yields order preserving solution to 
all bankruptcy problems. Further it is the consistent solution. 


Remark 4.3 To arrive at such a solution, we cannot expect success if we do not 
choose the right kind of coalitions to apply the CG principle. 


A natural cooperative game induced by the bankruptcy problem is through the TU 
game v(S) = {E — d(N\S), 0}. This is a game where each coalition S is willing to 
Ed 


forego the maximal demand of others and is willing to share whatever in leftover in 
the estate if any. This way they try to settle their claims out of court. The following 
is the main. 


Theorem 4.5 The consistent solution of a bankruptcy problem coincides with 
the nucleolus of the game corresponding to v(S) = (E — d(N\S))+ = max(E — 
d(N\S), 0). 


54 T. E. S. Raghavan 


Proof See Aumann and Machler [2] 


Remark 4.4 The formal proof of the above theorem is based on the notion of reduced 
game property of the nucleolus. Suppose a coalition S and payoff {x;, i € S} are 
given. The reduced game on the player space S is defined by 


VT) a=a7)tT=Saroe 
= max{v(QUT)—x(Q): OC N\S}if$STES. 


The intuitive idea is that the players in S can try to form new coalitions by trying 
to hire outside members of S, like Q C N\S, but paying exactly the wage x(Q) 
for joining them. Each non-empty coalition T ¢ S can look for the right coalition 
Q c M\S which will give maximum net value after the fixed wage x(Q) is paid to 
them. One demands that if x is the solution to the bankruptcy problem (£, d) with 
0 < x; <d;, Vi, then for any coalition S, 


Six 
Ved = Ux(S);d\S- 


In other words the reduced bankruptcy game coincides with the game corresponding 
to the reduced bankruptcy problem. We leave its technical proof to [2]. 


4.3. The Core and Balanced Collection 


Given a TU game (v, NV), the core C(v, N) = {(41 ...Xn) 2 x; = v(f{i}), es xj; > 
v(S), ee xj = vV(N)}. 

In general the core could be empty. For example if we consider the simple game 
with at least 3 players given by 


1 if|S|>2 
v(S) = : |S| > 
0 if |S| <1. 
Clearly, if (x1, x2..., X,) is any imputation, then it is a core element if and only if 
xXitxujp>l ViAjandx+---+x,=1 = x) =x. =--- =x, =0, acon- 


tradiction. Thus the game has no core element. 

Horse Market: Two sellers and 3 buyers are in a horse market. Each seller has 
one horse to sell. Each buyer wants to buy one horse. Sellers 1 and 2 are expecting 
at least 200K and 300K for their horses. The 3 buyers value the two identically 
looking horses for a price <350, <250 and <200K, respectively. A TU game with 
the 5 players {5S), 52, Bj, Bz, B3} can be formed. Suppose a price p is announced 
and 200 K < p < 350K. Then we can define V(S;, Bz) as the sum of the gains for 
the two players = p — 200 for the seller and 250 — p for the buyer and v(S;, Bz) = 
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p — 200 + 250 — p = 50. We see v(S), B3) = 0. Since p > 200 => the gain is 
0 for the coalition, v(S; Sz B; Bo B3) = v(S,, By) + v(Sz Bo) = 150 +0 = 150. 

Any core imputation (x, x2 : y; Y2 y3) where x's represent sellers reward and y's 
represent buyers reward satisfies 


x; + y, = v(S; By) = 150 


X1 + y2 = v(S; By) = 50 

x1 + y3 = v(S; B3) = 0 

x2 + yy > v(S2 By) = 350 — 300 = 50 
x2 + yo > v(S2 By) = 300 — 250 = 50 
X2 + y3 > v(S2 Bz) = 300 — 200 = 100. 


Since coalitions of more than 2 participants split into an optimal coalitional split, 
v(S;, S2, Bo) = max{v(S;, Bz), v($2, B2)} = max{50, 0} = 50. 

Here, Sj expects price p > 300K and B) expects a price <250 K. Thus, only the 
optimal combination of buyer-seller pairing alone is used in defining the coalitional 
worth. Since coalitions of just sellers or just buyers have no merit, we are looking 
for a feasible solution to x; + y; > aj; where a;; is the seller i, buyer j’s coalitional 
gain. Adding dummy buyers or sellers, we can assume the problem to have n buyers 
and n sellers and with a;; = 0 for all dummy buyer-seller combinations. The real 
pairing will satisfy x; + y; = aj; if (i, j) is part of the optimal pairing. 

Thus, our problem reduces to finding an optimal pairing i, o(i) of sellers and 


buyers such that 
Pm Gig(i) 2 > Gixri) V pairing T. 


i i 


This is the famous optimal assignment problem and is equivalent to 
max > > QijXij 
io] 
such that 
SST a1 eat 
J 
Sapa 117 H 12, sat 


xij => OVI, j =1,2,...n. 


Since the (x;;) matrix is doubly stochastic, permutation matrices are extreme points 
of this convex set of doubly stochastic matrices (see Bapat and Raghavan [4]). Our 
problem becomes a linear programming problem with an optimal permutation matrix 
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as the solution. By the duality theorem of linear programming, we can consider the 
dual 
min y uj + > vj 
i J 


such that 
ujtovjzaj Vi, fj. 
At any optimal solution (x7;) to the primal, we have by complementary slackness 
u; + 0% = dij if x7; > 0. 


Further >, uf + >> ; Uj is the worth of the grand coalition. 


Indeed, the core imputations are precisely any optimal solution (uw, ..., Un; 
U1, ..., Un) for the seller-buyer pairs of the above dual LP and is therefore non-empty. 
Example 6 


Find a suitable core imputation for the horse market with 6 sellers and 5 buyers where 
sellers’ base prices and buyers’ ceiling prices are 


S; => 200 B, < 350 
S2 > 230 By < 300 
S3 > 240 B3 < 260 
S4 > 245 Ba < 250 
S5 > 255 Bs < 240 
So = 270. 


We first look for a price p at which demand = supply. 

Trial 1: p = 210 ==> exactly | seller is interested and 5 buyers are interested 
at this price. Thus since supply is less than demand, price increases. Say p = 250. 
Exactly 4 sellers are interested, namely S$), S2, $3, $4. At this price 4 buyers are 
interested namely B,, Bz, Bz, B4. One buyer By, has to have no gain at this price. 
It favors buyers. When p = 245, again S,, S2, $3, S4 are willing and B,, Bo, B3, By 
are willing at this price. Thus, we have 2 prices p, = 245 and p* = 250 and in this 
range 245 < p < 250, we have matching demand and supply. Here, price p = 245 
favors all buyers and 250 favors all sellers. With p = 245 favoring all buyers, the 
arrangement is say S; B,; S2B2; S3B3; S4B4 as the match. 

Let (uv), U2...U6; V1, V2...U5) be a dual optimal solution. We have 
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uy + v; =350 — 200 
uz + v2 =300 — 230 
uz + v3 =260 — 240 
Ug + v4 =250 — 245 


At p = 245: S; gets a gain of 245 — 200 = 45 = u; 
: By gets a gain of 350 — 245 = 105 = v, 
: Sp gets a gain of 245 — 230 = 15 = up 
: By gets a gain of 300 — 245 = 55 = v2 
: $3 gets a gain of 245 — 240 =5 = u3 
: B3 gets a gain of 260 — 245 = 15 = v3 
: Sg gets a gain of 245 — 245 =0 = ug 
: By gets a gain of 250 — 245=5= 4 


us =0 v5 = 0 Up = O. 


The imputation (45, 15,5, 0,0, 0; 105, 55, 15, 5, 0) is uniformly good for all buyers. 
The nucleolus is the midpoint of these extreme ends of the core with p=245 and 
p=250. It is (47.5, 17.5, 7.5, 2.5, 0, 0; 102.5, 52.5.12.5, 2.5, 0). 

The following theorem gives a necessary and sufficient condition for any TU game 
(v, N) to possess a non-empty core. 
Theorem 4.6 (Bondareva and Shapley) The core of a TU game (v, N) is non-empty 
if and only if the linear program 


min > x; 
ieN 
subject to yom >v(S) V SCN 
ieS 
has a minimum z* < v(N). 


Proof Let x* be any minimizing element. Clearly x* is in the core. Suppose x is 
any core element. Then x satisfies bas” x; > v(S) and Dey x; = v(N). Thus the 
minimum must be z* < v(N). 


Consider the dual to the above linear program. It is given by 


max = Asv(S) 
SCN 
subject to Yo As =1VieN 
S>3i 


As=>O VSCN. 
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When the core is non-empty, we have an optimal solution to the primal and hence by 
the duality theorem we have an optimal solution to the above dual program. Further 
at any optimal solution 4% of the dual, we have 


Y > A§u(S) < v(N). 


While the theorem per se is only of theoretical interest, the dual problem’s extreme 
solutions have a certain elegant combinatorial structure. 


Definition 4.3 Let S$), S2...S,, be a collection of non-empty subsets of 
N = (1, 2,...,}. We call them a balanced collection if for some y,, y2...¥n > 0 


ey =1 foreach ie N. 


S;>i 


Intuitively, if we think of S,, Sz, ..., S, as coalitions and consider only those among 
them that include player i, he allocates his unit time among these coalitions devoting 
y; proportion of his available time to be part of coalition S;. In a sense it is a 
generalized partition of NV. 


Example 7 


N ={1,2,3,4} S; = (1,2) S& = (1,3) 3 = (1,4) & = (2,3,4). This is a bal- 
anced collection with y; = 4 y. = 4 y3 = 4 y4 = §. 


We call a balanced collection minimal if it has no proper subcollection which is 
balanced. 


Theorem 4.7 The extreme points of the polyhedra defined by 


Yo As =1 Vie {1,2,..,n} 
S3i (4.1) 


are the balancing vectors of the minimal balanced collections. 


Proof Clearly, 7% = {S: Xs > 0} is a balancing vector for the collection .7. Sup- 
pose .¥ is not minimal, then we have a subcollection # with balancing vector 
{us : SC N}. We know ws > 0 => Ag > 0. Thus, for € small aw = (1 —€)A+ 
ep, w = (1+ €)A — ep satisfy the constraints (4.1). But w 4 w' as ws < w when 
S ¢ subcollection Z. Now 4 = ore and A is not an extreme point. 

Conversely let .“ be a minimal balanced collection. In case A is not an extreme 
point, we can find a w 4 a’ and both satisfy (4.1). Clearly, if As =0 => ws = 


wy = 0 (as w§ are > 0). Let B= {S: ws > 0} and ZB’ ={S: wi > 0}. They are 
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both subcollections of .Y and since .Y is minimal balanced, 4 = A = w= 
a’ = i. This contradiction shows that A must be extreme to (4.1). 


Remark 4.5 One can check that at most n of the A.5’s are positive. 


Theorem 4.8 A TU game (v, N) has non-empty core if and only if for each minimal 
balanced collection S = {S,, ...Sm} with balancing vector (1, ..., Am) satisfies 


Y\ajv(S;) < v(N). 
j 


Proof Since the associated LP 


max by Asv(S) 
SCN 
subject to Yo ds =l1ieN 
S>0 
As = 0 


attains its optimal value at an extreme point, we have the required result. 


4.4 The Lexicographic Center [20] 


Given a TU game (v, NV) and the imputation set X°, the satisfaction f(x, S$) = 
x(S) — v(S) for any coalition S may vary depending on the imputation x. However, 
Ff, 6) =O and f(x, N) = 0 for x € X°. Let &° = all coalitions of N other than 
the empty coalition @ and the grand coalition, N. Consider on X° the function 


O(x) = min f(x, S). 


Since f (x, S) is affine linear, 6 (x) as aminimum of affine linear functions is a concave 
function and also 6(x) is continuous on X°. Let max 6(x) = a’. Now consider the 
xeX? 


set X’ Cc X° where 
= {xe X°: min f(x, S) =a’}. 
Sexe 


We can check the following. Suppose x*, x° € X’. We know 6(x*) = min f(@, S)= 
€x0 

a’. Similarly 6(x°) = a’. Being a concave function on X°, V O<A <1, 0(Ax* + 

(1 — A)x?) > Aw’ + (1 — Aja’. However, max 6(x) =a’ and so the imputation 


Ax* + (1 — en is also in X’. That is, X’ is aclosed convex set. Let ©; = {S € X°: 
f(S,x) =a! Vx € X'}. We claim 4, ¢ ¢. Suppose not. Then given any S; € X° 
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there exists an x; € X’ such that f(x;, S;) A a’. Since x; € X’, 0(x;) =a’ and so 
Sx; Sj) = 0(x;) =a’ and so SX; Sj) >a’. 

If we define x = + DMs then since X’ is a convex set and since xj € xX’, xe 
X’. Thus, we should have a coalition § such that FQ, 5 )) = 6(x) = a’. However, 
S € ¥2 and since S$ is one of the S;’s in X°, for some x;, we have f(x;, S) >a’ 
while f (xx, Srv => SQ, S) > a acontradiction. Hence, X’ C X° and X/ = 
°\ dD, 4 dand UX? D DY. 

We can inductively define for each k = 2...k*: 


X* ={x e X*!: minf(S, x) = a*} 
Sex’ 


my ={SeDk!: f(S,x) =a* Vx € X*} 
vk — pamen 3 


and k* is the first value of k for which * = ¢. 
We claim that a’ < a? <--. <a* <a‘*!, Using x € X* we have f(x, S) > 


ak WS € D*. Thus, aft! > min f (x, S)> atk. 


Theorem 4.9 The lexicographic center of a game consists of a unique point, and it 
is the nucleolus. 


Proof We know that the lexicographic center X*" 4 @. The satisfaction of each 
coalition S is constant in X* if § © X, and hence constant in X*". As this holds for 
singleton coalitions, we see that X*" can’t have more than one point. We will show 
that the lexicographic center coincides with the nucleolus. To prove this we need the 
following. 


Lemma 4.3 For any 1 < k < k*, ifx € X* and if y € X*'!\X*, then @(y) > A(x). 


Proof Consider %;, X22... Ux—1, pe {@, N}. They constitute a partition of Qn. 
Ignore ¢ andWN as @(x) and 0(y) have the same value zero for these coordinates. 
Since x € X* and y € X*~!\X*, we have 


f(x, S) => of and f(y, S) > ak!” 
For all Sin, 32 = 1,2,k — 1. FurtherV S € D*!, we have f(, 8) = ak > gk! 


and f(y, S) > a*—!. However, since y ¢ X* wehavea* > f(R, y) > a! for some 
R € =*!. Hence 0(x) - O(y). 


Example 8 


Find the nucleolus for the following TU game v(1) = v(2) = v(3) = 0, vd, 2) = 
v(1, 3) = 30, v(2, 3) = 80, v1, 2, 3) = 100. 
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We will consider a’ = max mo f(x, S). Let 07) = min f(x, S), then a! = 


{1,2,3} 
max 0(x). Thus it reduces to the following LP: 


max 0 
x, > 0,x%2 > 0, x43 > 8, xy +x. — 30> 0, x1 +x3-30>0 


and x. + x3 -— 80> 06, x1 +4.+ x3 = 100. 


Solving the LP, one gets 6 = 10 and an optimal solution is xf = 10, x} = 60, x} = 
30. However, the constraints xf > 10, x} +x} > 90, xf +x} +x} = 100 => 
xj = 10, x} +x} = 90. It is so for any optimal solution as it should satisfy the 
constraints for 0 = 10. 

Thus &; = ({1}, {2, 3}). Our next LP corresponds to 


max 0 
x, = 10, x%. +43 = 90, 12 > 0, x3 > 0, x4) $42 — 30> 80, x1 +43 -—30> 80. 


Let a? be the next optimal value. We can check that a? = 25. Now our constraints 
give 
x, = 10, x1 +2 > 30 +07 =55, x1 +43 > 30 +07 =55. 


Thus x» > 45, x3; > 45. This shows that since x. +x; =90, x; =10, x2 
= 45, x3 = 45. We get X* = X? = {(10, 45, 45)}, Ho = ({1, 2}, {1, 3}) and X3 = 
({2}, {3}) and we have reached the nucleolus = X? = (10, 45, 45). 


Example 9 


Consider the game with v(1) = v(2) = v(3) = 0, vd, 2) = vd, 3) = 0,7 vQ, 3) = 
10, v(1, 2, 3) = 6. Find the nucleolus. 

Remark 4.6 We notice v(1 2 3) < v(2 3). In fact this game has empty core. We can 
solve the LP 


max 6(x) 
x, = 0,x2 = 0, x3 = 0, x) = 0,42 > 0,%3 > 0 


Xp tx. > 0.x, +43 > 0 andx.4+43-0 > 10,x, +22 +%3=6 


giving optimal 6 = —4. Thus, the first LP’s optimal value a’ = —4 and Y, = 
{(2, 3)} Z_ = {1} and D3 = {2}, {3}, {12}, {1 3} giving X’ = X? = {x : x, =O}NT 
and X3 = {(0, 3, 3)}. This is the nucleolus. 
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The problem involves solving a sequence of linear programs and each iteration 
leads to at least one new coalition that has constant satisfaction on the convex subset 
of the imputation set. The iterations may grow exponentially where even the LP’s 
considered each time will have an exponential number of linear inequality constraints 
of the type x(S) — v(S) => 6 where S’s vary over a subcollection of coalitions. In 
general the problem becomes NP-hard. 

We will concentrate on special classes of TU games (v, N) where the data defining 
the game is a polynomial in the number of players. For example, we saw in the horse 
market, the only coalitions that matter are buyer-seller pairs. 

We are ready to introduce the so-called assignment games. 


4.5 Assignment Games [27] 


Assignment games with side payments are models of certain two-sided markets. One 
such example was already discussed in the form of a horse market where all horses 
are identical except for price expectation by sellers and ceiling prices by buyers. A 
clear motivation can be given by the so-called real estate game. 

House owners H; and H> are keen on finding buyers who would pay at least 
$200K and $230K for their houses. A real estate agent is approached by them, and 
they are willing to give a commission proportional to the price gain. Two buyers who 
want to buy just one house also approach the same agent. The agent wants to know 
their budgets. The buyers say that it will depend on the house. The agent shows the 
two houses and buyer | values H; as at most $290K and values H2 as at most $260K. 
Buyer 2 values H, as at most $300K and values house H) as at most $280K. 

They are willing to give commission proportional to price savings. Thus, if H, is 
sold to By at a price p with 200 < p < 300, seller H; gains p — 200 and buyer Bz 
gains 300 — p. We will assume the commission to the agent is k(p — 200) + k(300 — 
P) = k(300 — 200). W.l.g. k = 1 and the sale fetches a commission = 300 — 200 
for the real estate agent. Thus, the following matrix summarizes these as coalitional 
worths: 

Buyer B, Buyer By 
Seller Hy, 90 100 
Seller Hy ( 20 50 ) 


We can represent 


v(M, B}) =90 vi, B2) = 100 
v(Ab, B,) = 20 v(fA, Bo) = 50. 


Clearly, any other coalition like v(H,, Hz) = v(Bi, Bo) = v(M) = v( Ao) = 
v(B)) = v(B2) = 0. 
Clearly, we can define v(H,, Hz, By) = max(90, 20) = max(v(A, B,), v( A B,)). 
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Similarly, v(B, By Hy) = max(v(A2 B,), v~A B2)) = max(20, 50) = 50. 

Finally, vi, H,2 B,, Bz) = the optimal assignment which in this case is 
u(H,, By) + v(Ab, Bo) = 140. 

Suppose we denote the single seller and single buyer coalitions as the columns 
(seller) and rows (buyer) of a matrix and the satisfaction at any imputation for seller- 
buyer pair as the matrix entries. For example, consider 


Vi U2 
u; (90 100 
uy og 50 ) 
We restrict u’s and v’s to satisfy wu) > Ou. >O0vy >O0w>0 (+01) + Wot 
v2) = 140. Since v(H)) = v(A2) = v(B)) = v(B2) = 0, the singleton coalition sat- 
isfactions are also same as uj, U2, Vj, V2, etc. However f(A), By) =u, +1 — 
v(X, B,) =uy+v,— 90. Similarly, fh, Bo) =u, + V2 — 100, fh, B,) = 


u2 + vy, — 20, f(Ab, Bo) = uz + v2 — 50. For example, we can reward all the coali- 
tional worth to sellers, after H; B,; and Hz Bo are paired. We get say 


0 0 
90 ( 0 —10 
50 \30 0 /} 
Thus, we notice that the coalitional satisfaction for singleton and seller-buyer pair 
are given above. 


Theorem 4.10 The nucleolus is determined only by singleton satisfaction and buyer- 
seller coalitional satisfaction. (For a proof, see Solymosi and Raghavan [29]) 


The imputation (90, 50; 0, 0) has a satisfaction vector for all singleton and buyer- 
seller coalitions given by = (90, 50, 0, 0, 0, —10, 30, 0). The rest of the satisfactions 
are irrelevant to nail down the nucleolus. Rearranging the vector above in increasing 
order, we get the vector (—10, 0, 0, 0, 0, 30, 50, 90). Thus, the worst hit coalition is 
(H,, Bz). The worth of the grand coalition is the worth of the optimal assignment 
(H, B,, Hy Bz). We know that since the game has a non-empty core and since 
the nucleolus is in the core, satisfaction will not change for any imputation in the 
core for not only the grand coalition but also for those which satisfy f(x, H, B,) = 
0, f(x, H2Bz) = 0. We call them settled coalitions where satisfaction is constant 
over the core and it is currently 0. 

We compute 


min f(u1, V1, U2, v2, S) = —10. 
SA(H,, Bi)(H2, Bo) 


The next intuitive step is to move from the current imputation to a new imputation 
in a chosen direction that improves the satisfaction of the worst-hit coalition. 

Suppose we chose the direction d = (0, 1; 0, —1). For 6 = —10, this gives a new 
imputation 
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(90, 50; 0, 0) — 10(0, 1; 0, —1) = (90, 40; 0, 10). 


Observe that satisfactions of singleton and buyer-seller coalitions are non-negative. 
It is also a core imputation. In reality we have reached an extreme corner of the 
core. Suppose we choose the new direction (1, 2; —1, —2). Again with 6 = —10, 
we get a new imputation (90, 40; 0, 10) — 10(1, 2; —1, —2) = (80, 20; 10, 30). We 
can check that this new imputation gives a satisfaction vector for singletons and 
buyer-seller coalition vector (80, 20; 10; 30, 0, 0, 10, 10) which when arranged in 
increasing order gives (0, 0, 10, 10, 10, 20, 30, 80). 

Now moving in the direction (1, 1; —1, —1) with 8 = —10 gives (80, 20, 10, 30) 
— 10€1, 1; —1, —1) = (70, 10, 20, 40), and this imputation gives satisfaction vector 
for singleton and buyer-seller coalitions (70, 10; 20, 40, 0, 0, 10, 10) which when 
arranged in increasing order gives 


(0, 0, 10, 10, 10, 20, 40, 70) > (0, 0, 10, 10, 10, 20, 30, 50). 


In fact we have reached the nucleolus. To claim that this is the nucleolus has to do 
with the nature of the core in assignment games. 

We chose directions with no real explanations. We also chose the new directions 
with no real explanations. We terminated our calculation with no reason as to why 
the algorithm terminates now. 

For assignment games, a method based on the linear programming technique 
is not well suited since the combinatorial structure of the characteristic function 
cannot be effectively translated into a continuous problem formulation. When solving 
the optimal assignment problem, the ordinary primal simplex method encounters 
difficulties caused by degeneracy. It is outperformed by well-known algorithms like 
Kuhn’s Hungarian method of (see [4]). 

In an assignment game, the first step is to define the worth of the grand coalition. 
Intuitively, it corresponds to the optimal pairing of the sellers and buyers by the 
real estate agent that will fetch him the maximal commission. This occurs for the 
value of the optimal assignment. Suppose we associate a graph with vertices as the 
optimal assignment. It is quite possible that the assignment could either keep some 
houses unsold or some buyers unable to buy any house. By introducing another 
dummy vertex, we can associate with any m seller, n buyer assignment game with 
k =min(m, n), a graph with k + 1 vertices where when k < m a seller is unable to 
sell his house or when k < n a buyer is unable to buy any house. 

In such a situation, we introduce a dummy vertex 0 such that we have k + 1 ver- 
tices 0, 1,2,...,k. If we have different optimal assignments, then the player com- 
mon to two different optimal assignments gives us an option to connect two distinct 
assignments with an edge. Such a graph can be decomposed into several connected 
components (i.e. maximal complete subgraphs). Having formed such connected com- 
ponents, the key problem is to find more binding constraints between players related 
to different components that would introduce new edges to the graph and reduce the 
number of components. 
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Without loss of generality, we can assume that there are enough buyers to buy all 
the houses in the market. That is, m = |M| < |N| =n where M isthe set of real house 
sellers and N is the set of real house buyers. It will prove convenient to introduce 
a fictitious seller and a fictitious buyer who are present in any coalition. Thus M = 
M U {0} and N = N U {0}. We can think of any coalition (S,T) as SC M, 0€S 
and T CN, O€T. With this notation, we write (0,0) for the empty coalitions, 
(i, 0) for row player i € M and (0, j) for the column player _J € N. We assume 
400 = io = Ao; = 0 as the coalitional worth V i € M, JE N in the augmented 
profit matrix A = (a;;)(y wy. Also the payoff is fictitious too with up = vp = 0 in 
any payoff vector (u, v) = (Uo, U1, ..-Ums VO, U1,--+, Un). 

By an (S, 7) matching 4, we mean a correspondence between S and T that 
induces a matching on (S, T) where S = S\{O} and T= T\{0} and (0, 0) € uw. Any 
player who is not matched to a real player is tied to a fictitious player. 

Thus, the characteristic function of an assignment game with augmented profit 
matrix A = (aj;)m,y is given by 


v(S, T) = max x aj; V (S, T) matching uf. 


G@/eu 


Since a;; = 0 Vi, j, we can assume that jz is a full (S, T) matching. That is 4 (match 
between real players in S, T exhausts either S or T of real players). 

Formally, let fj;(u, v) be the satisfaction to coalition (i, j) when players are paid 
according to (u, v) (here i = selleri, j =buyer j). Thatis fj;(u, v) = uj + vj — ajj. 


Theorem 4.11 The core C of an assignment game A(M, N) is given by 


C={u,v): >> fis, v) =0, fis, v) = OV Gi, j) € (M,N)}. 
ieM 
JEN 


Shapley and Shubik [27] showed that every element (u, v) in C is an optimal solution 
to the dual of the linear program 


max : QijXij 


subject to ed : wel Vv Gi, j)EeMxN 
jen ieM 
Xij = OV G, J). 


At the same time, the dual to this primal is 


66 T. E. S. Raghavan 


min )\ uj + Y 9; 
such that u; + vj > aij VieM, jEN 


uj,vj; = 0 MiG. J: 


The maximum for the primal is also the same as max }°; aj¢(;) where o is any 
oOo 


permutation. This is a consequence of the following theorem of Birkhoff-Von 
Neumann [4]. 


Theorem 4.12 For the convex set X of alln x n doubly stochastic matrices, i.e. 
X={(jj) + xij 20 So xy = PE = |} 
j i 


, its set of extreme points are the set of all permutation matrices 


Qn = {p? : pi, =1if j e oi) 
=0 f j go} 


where o = (1, 2...,n) > (0 (1), o(2), ..., o(m)) is a permutation of {1, 2, ..., n}. 


By complementary slackness, this says that optimally matched pairs share the 
exact coalitional worth in any core allocation. That is, for an optimal (u*, v*) of the 
dual, wu; + vj; = aj; when j = o*(i) and p° is optimal for the primal. But unmatched 
pairs receive nothing in any core allocation. 

The lexicographic center for assignment games gives a truly efficient iterative 
process to reach the nucleolus in polynomial time. The reasons are many: 


(i) The combinatorial nature of assignments 
(ii) An easy way to choose a direction in each iteration 
(iii) Strictly shrinking set of imputations in each iteration with the dimension of the 
convex subset of imputation getting reduced by at least one after each iteration 
(iv) The lattice structure of the core. 


We can essentially ignore all but the singleton and buyer-seller coalitions for 
applying Maschler-Peleg-Shapley’s iteration process as follows: see Mashler et al. 
[20], Solymosi and Raghavan [29]. 

Let o be a fixed optimal (VM, N) matching for an assignment game A(M, N). We 
construct a sequence (A°’, ©°), (A!, &!),..., (A°*!, D+") that are partitions of 
(M, N) such that 5° > BS! D---D b?t! = # and a nested sequence X° D X! D 
-»- > Xt! of sets of payoff vectors iteratively defined as follows: Let A? =o 
and X° = (M, N)\o. Let us start with imputation (u°, v°) where u? = digi) Vi € 
M, vs =0, jE N. Leta’? = Pe v)y= ee + vt — aj;;. To start 
with X° = {(u, v): (u,v) = (0,0), uj +vj —ajj =O if @ j) € A®, uj t+ouj— 
ayy = aPV (Gj) € XU}. 

We define recursively for r = 0, 1, 2, ..p. 
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(i) wt! = max min (u; +0; — aij) 
(u,v)eX" (i fed" 


(i) X’*! = {(u,v): (u,v) € X", min (uw; + vj — aj) =a"} 
G@ jyexr 


Gil) X41 = {G@, j) € U": u; +; — ajj is aconstant on xt 
(iv) Sth = Sry, ATES ATU Sa. 

Here p is the last index r for which &! 4 ¢. We call X°*! the lexicographic 
center of A(M, N). 


Remark 4.7 The lexicographic center is independent of the choice of optimal 
assignment. In factifa@°? < 0, then the core Cc X°. Incaseaw® > 0, we have X° Ccore. 
Further a! > 0 and that X' C core. Also when a} = 0, X' = C (for proof details, 
see Solymosi and Raghavan [29]). Thus at the end of the first iteration, we always 
reach the core. 


Lemma 4.4 For 0 <r < p if (u,v) € X't! and (y,z) ¢ X'*! and (y,z) € X', 
then lexicographically (u, v) > (y, Z). 


Proof Same as Lemma 4.3 for this special case. 


Theorem 4.13 The lexicographic center of an assignment game coincides with the 
nucleolus. 


Proof Let X°*+! = {(u,0)}. We have (u@, 0) > (y,z) € J where (a, 0) # (y, 2). 
Therefore the nucleolus v = (UW, UV). 


At any stage of the iteration, we have a particular A” = A, &” = & of (M, N) 
and set X’ = X, the current feasible payoff set. We call A settled coalitions, and & 
unsettled coalitions. The reason that a coalition (i, 7) € A is settled is because its 
satisfaction is constant over the entire feasible set X. Coalition (i, j) € & is called 
unsettled because the satisfaction of (i, 7) € & is not a constant over the entire set 
Xx. 

Suppose (i, 7) € A and also suppose (i, o(k)) € A where k is a seller and a (k) 
is the optimal buyer of house k. If so, we have the following important feature of the 
settled-unsettled partition. 


Lemma4.5 (i) (G,o0(kK))E A = (kh, of) EA 
(ii) (@,0(k)) € A, (k,oG)) eA = Gi,o()) € A foralli, j,k € M(sellers). 
Proof Ifi € M (areal seller), there is a unique partner o (i) € N (areal buyer). For 
0 € M,o(0) = {0} U N\o(M) and so each column is partner of a row. We have also 
assumed |M| < |N]. 
To show (i), observe fig(x) = Ui + Vo(K) — Gio(k) = constant over X (by assump- 
tion). Also fii) = Ui + Vo) — Gio) = 0 on X° D X. Thus 


fist fro) — Siok) = 
[ui + Vo i) — Gioi)] + [Uk + Vo) — Ako] — [Ui + Vow — Gioky] 
= const C + const Cz + const C3 


= Ux + Voi) — [Gio(i) + Ako (ky — Gio(k)] 
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over the whole set X. Thus fko(i) = Uk + Vo(i) — Ako(i) = Const on the whole set X. 
That is, (k, o(i)) € A when (i, 0(k)) € A. 
(ii) The argument is similar. 


Remark 4.8 We notice that (i) and (ii) induce the equivalence relation ~ where 
i ~ k if and only if (i, o(k)) € A). Thus, the rows i and k (sellers i and k) are tied 
by such constraints over the entire set X. (We did not check i ~ i as it is the case by 
definition.) 


Let us decompose the sellers and buyers into equivalence classes M,, Mi, ..., Ma 
and the equivalence class of partners N,, Nj,..., Na where N, =o (M,), O< p< 
d. We stipulate 0 € M, and so 0 € N,. We say i ~ k if and only if o(i) ~ o(k). 
Thus, such a partition of M and N induces the partition of (M, N) with blocks, 
namely (M, N) = see gM p N,). The following is the key theorem crucial for our 


algorithm. 
Theorem 4.14 The settled-unsettled partition (A, X) satisfies 


(i) A= _ U (M,,N,) andhenceX=  U (My, Nq) 
O<p<d O<p#q<d 
(ii) If, J), (k,D) € (My, Ng) then fi; — fxi is the difference in their satisfaction 


which is a constant over the whole set X. 


Proof (i) If Gi, j) € (Mp, Np) => Ak € M, such that j = o(k). Since i ~ k we 
have (i, 7) = (i, a(k)) € A, hence A = . U {> Ny). 
SPs 


To prove (ii), we claim wheni ~ k andi,k e M = > uj; — uz = constant on xX. 
Ifi~k => (io(k)) € A, ie. fio = Ui + Vo(k) — Gio(ky = const on X. By sub- 
tracting fio) We getuj; — uz = const on X. Similarly, if column j and/ are tied then 
vj; — v; = constant on X. Thus, if (i, j), (k, 1) € (Mp Ny), i ~ k and j ~ 1. Then 
(fi; — fa)(U, v) = (uy — ug) + (vj — v1) — (ij — ay) =const on X. As a conse- 
quence if A = A’ = (M, N,) U...(Ma, Na) and r > 1, then dimxX” = d. In fact, 
linear functions of the form (fig — fxo)(u, v) = uj — Ug, i,k € M, are constant on 
X = X". Thus for each p we can choose |M,|— 1 linearly independent ones. In 
the settled class M,, the function u; = u; — uo are constant on X’. Thus we have 
|M,| + ame (|M,| — 1) linearly independent linear functions that are constant on 
X” = X. Thus dim(X") < d; 

we claim dim(X”) > d. To see this: the function for (r > 1) fio, v) = uj, € 


UM, are non-constant on X". 
l<p<d 


Thus all we can say is that the satisfaction of tied players must change by the 
same amount. However, there is no binding relationship between players of different 
classes. Thus, we have d linearly independent vectors in X" and dimX" > d. 


The following lemma helps us to locate the coordinates of the nucleolus one by 
one in the iterative process. This is due to X” possessing lattice type properties. 
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Lemma 4.6 Let fj;(v) be the satisfaction for coalition (i, j) at the nucleolus impu- 
tation v. Let fij(v) = cij V G, j) € (M,N). Then forr > 1 


Y= ={G,v): fu, vy cy VED € As fu, v) =e VG, j)e 2}. 


See [29] for a proof based on induction. 

We can extend the definition of Y’ to the case r = 0 by 
X° = YN {(u, v) : (u, v) > (0, 0)}. Thus for 0 <r < p and d” <a <d't!, we 
can define Y(r, a) = {(u, v): fiju,v) =ajVG, J) € A", fiju,v) >avVi, j) € 
x}. 

I a increases Y(r, a) shrinks. For the maximum value d’*+! for a, Y(r,a) 4 
@. Even the sets Y(r,a@) possess the following lattice type property, namely if 
(u', v’), (u2, v2) € Y(r, a) so does (u’ V u?, v! A v?). Similarly (u! A u?, v! Vv?) € 
Y(r, a). The proof is easy and is omitted. 

Thus for each a, we have two special vertices u; = u;(r,a@) as the maximum 
payoff for each v € M and vu; = v;(r, a) as the minimum payoff for j €¢ N. These 
two special vertices are the only ones we concentrate on for our algorithm. 


Definition 4.4 Fix any 0 <r < p and a” <a <a't!. Let © = D". Choose the 
u-best corner of X (r, ~). Among unsettled coalitions in & 


uo=z,,a = {(i, j) € Li(,a) = fii, v) = a} 


are called active unsettled coalitions. Those in 2. = X.(r, a) with fij(u, v) > a 
are called passive coalitions. 


The directed graph G = G(r, da): Fix any 0<r< panda’ <a< a’+!, Let 
d = d(r), where we have d + | equivalence classes 0, 1, 2,..., d. On these nodes, 
we connect node p and node q by an edge 


E= {a O<p#q<d, (My, Ny) Xa, a) +o}. 


The graph is proper if it is acyclic and has no incoming arcs to 0, and is called 
improper otherwise. 
The following is a fundamental result for the algorithm. 


Theorem 4.15. G is proper if and only if d" < a < d"™*! = a <d'™!. 


Proof The proof relies on moving inside the current set of feasible solutions with 
one end at the u-best v-worst corner and choosing the direction defined by the longest 
path ending in a node p. Let /(p) be the length of the longest path. Let s; = —/(p) if 
ieM, Vie M andt; =/(q) if j ¢ N, Vj € N. The payoff (w’, v’) = (u,v) + 
B(s, t) gives satisfaction fj; (u’, v’) = uj; + uv; + B(s; + t;) — aj; which is unaltered 
if (i, 7) € Aasi € M, j € N, (Notices; + t; = Oif (i, j) € A). If(@, 7) € De then 
p#qandl(p) + 1 < 1(q) andso —I(p) +1(q) = 1. Thus for small £, fij(u’, v’) > 
atp = (’,v') € Xandd't! >a. 
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Suppose G is improper. This could be due to either an arc from p to 0 or due 
to a cycle in the graph. Suppose we have an arc (p,0). Thus die M, & je 
No such that u; + vu; — aj =a. Since (0, j) € (M,,N.) CA fiju,v) = vj = 
Cop VU,v)eX => VU; = Coj- 

On the other hand, uj >u => fij(u,v) < fijU, v) =a Vu, v) € X. Thus 
d'*! <q. In case of a cycle (p,q), (q,1r), (", p) (say), we have fj, ;,(u, v) = a, 
Si,j,U, v) =o and fi, ;,(4 v) =a. When we sum them, we can replace this sum 
with a constant since each is at least w at any (u, v) and we get3a = )~ fijGU, v) = 

Cycle 
> fii, v) > 3a => fi,j,(u,v) =a V (u,v) € X andd’t! <a. 
Cycle 


As a consequence of the above theorem, if 
B(r, a) = max : min f,j4,v) +r(s,t) =at+ r} 
G,jyex 


then (uv, v) + B(s, t) is the u-best corner of X(r, a + B). 


The algorithm: We construct a finite sequence of payoff vectors where each one 
is the u-best corner of the current feasible set and the last one is the nucleolus. 
Finding the next payoff is translated to finding the longest path ending in each node 
which summarizes the interconnection between unsettled coalitions. It reveals certain 
relations which prevent any further improvement in their guaranteed satisfaction. 

Imagine a federal government distributing a part of the collected GST between 
states. While individual states can grumble with what they get, a coalition of states 
say ruled by a party may raise their voices as discriminated. The approach of the 
federal government is to find the worst hit coalition with current share and redo 
distribution to make sure that those coalitions who are next in line as the second 
worst are not uniformly pulled down to be below the worst hit coalition’s satisfaction 
due to the redistribution. Here is how the algorithm works: 


(i) While & 4 @, do (2) — (12) (iteration 2), 

(ii) Build the graph G := G(r, a) 

(iii) While G is proper, do (4) to (9) (step r, @) 

(iv) Find direction (s, t) 

(v) Find step size 6 := B(r, a) 

(vi) Update arcs in G 

(vii) Update payoff (u, v) : (u, v) + B(s, t) 
(viii) Update satisfactions fj;, V @, j) € X 

(ix) Update guaranteed satisfaction level a := a + B 

(x) Find to be settled coalitions ©; Uy+1 

(xi) Update partition 5 := X\D, A:= AUD 
(xii) Setr =r-+1. 


An illustrative example: There are 4 sellers (represented by rows i = 1, 2, 3, 4). 
There are 5 buyers (represented by columns j = 1, 2, 3, 4, 5). We introduce adummy 
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buyer 0 as well as a dummy seller 0. Thus, any singleton coalition is identified as a 
buyer-seller coalition with a dummy seller or dummy buyer tied to them. Thus, we 
have the following augmented matrix with 5 rows and 6 columns. Since the number 
of equivalence classes is p = 0, 1, 2, 3, 4, we also indicate the equivalence class to 
which each buyer j or seller i belongs and we have the following matrix. The grand 
coalitional worth is the worth of the optimal assignment and they correspond to the 
boxed pairings 


p—- 0 01234 
,& 0 12345 
00 0 [[2 0000] 
Lin lellaso] 
2217] 4317/83 
33 |6] |013/6/4 
44 18) 995918 


We initiate our algorithm with all the coalitional worth given to the sellers and 
nothing to the buyers. Thus, outside the real coalitional matrix we have sellers i = 
1, 2, 3, 4 receiving 


S| 
II 


aonnn 


and buyers receiving v = [0 000 0]. 

The optimal assignment is (1, 2), (2, 3), (3, 4), (4,5) between real seller-buyer 
pairing. They are the boxed entries. We have buyer j = | tied to player (seller 0) 
getting boxed | 0 |. In our next step, we write down the satisfaction for each coalition 
of seller-buyers. For example, the satisfaction for the coalition seller 1, buyer 1 is 
a, +v, — a, = 7+0-—6= 1. We write down the satisfaction of seller | alone as 
U1 + V9 — 419 = 7+0-—0=7. The following matrix summarizes the satisfaction 
details for w and v given as a blue column vector and blue row vector. 
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Uv 
0] 0 0 0 ol 
s= 
‘4 1 fo) 3 2 -2f 0 
_ 7 3 4 |o| -1 4 0 B=1 
7 6 6 5 3 |o| 2 0 
8 6 6 3 1 (|0 = 
0 0 0 0 1| 
t;= 


Boxed coalitions are the settled coalitions consisting of (seller 0, buyer 1), (seller 
1, buyer 2), (seller 2, buyer 3), (seller 3, buyer 4) and (seller 4, buyer 5). These entries 
will not change anymore over the entire iterations. The most unhappy coalition is 
seller 1, buyer 5. We have equivalence classes p = 0, 1, 2,3,4. 7 = lisin No, j = 2 
isin Nj, i = 1 is in M,, etc. Introducing the graph with vertices 


p=(0|f234 


i = 1, 7 = 5isin the equivalence class M,, N4 and so this active unsettled coalition 
gets an edge joining node | to node 4 as indicated above. The next worst satisfied 
passive coalition isi = 2, j = 4 and in terms of [p, q] it is equal to (2, 3). Here, 
ti; =lUqjifjeN, = t= (0,0,0,0,1) andie M, = > 5; = —l(p) ands = 
(0, 0,0, —1). 

We update the data with 6 = | which will improve the satisfaction of the coalition 
i = 1, 7 =5 from —2 to —1 and we will have both (7, 7) = (1, 5) and (i, j) = (2, 4) 
as active coalitions when updated as represented below (see * entries). 


/ 


u, =; — BS; ; vi = 0; + Bt; giving uw’ = (7,7, 6, 7)" and v’ = (0,0, 0, 0, 1). 


[ [0] 0° 0° 0° 4] 3 
71 Tile) * 2 =1*| To 
7|13 4 fo|-1* 5 0 
61165 3 fo] 3 | ]-1 
41155 2 0 |6 —1 
t= [00011] 


Here, the new worst hit coalitions after the initial improvement are (i, j) = 
(1,5), (2,4). We should note that 7 = 5 belongs to N4 and j = 4 belongs to N3 
and i = 1,2 belong to Mj, Mo. Thus p=1, gq=4, p=2, g =3 are the two 
directed arcs introduced. 
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> a Cn 
p=|0| 12 —>3 4 gives 


t = (0.0, 0, 1, 1) 
s = (0,0, —1, —-1). 


Here, * entries correspond to the worst satisfied coalitions and © entries mark the 
next worst satisfied in line but passive coalitions. Thus for 6 = 1, the * entries are 
improved to the level of © entries = 0. We update the table entries using the current 
s and t. We set the new u = (7,7, 5, 6)” and the new v is (0, 0, 0, 1, 2). With this, 
we update the satisfaction matrix and we set the updated matrix that is given below: 


[ [0 oror 1 2| ee 


A Piro) 3. 3 -0F] Fa 
7||3 4 [0] 0* 6 | | -1 
5} | 5 4 2 [0] 3 | | -3 
md a 4 108 0] 

t= [01132 


The related graph is still proper, and it is given by (based on * entries) 


a aN 
p:|\O0|j>712>3 <4 


and this gives /(p) = t = [01132]? ands = —t. For B = 5, the worst hit « entries 
increase to 5 while the second worst hit entries f;; and f43 both reach the value ; 
and the new u and v are given by u = [63, 65, 35, 5]" and v= (0, 5, 5. 3, 3] and 
the new satisfaction at these (uw, v) is given by 


1 17-5 
O} a 3 5 3| 


So 
SS 


* 
— 
* 


W Nini! 
Obi! 


Currently, the most unhappy coalitions have satisfaction 5, and they are at the « 
locations. We get the graph corresponding to M, > Ni, Mj > No, Mo > No, 
M, — Na, M4 > No, M4 > N3 given by 


p:[0}- 1 2 3« 4 
Ve ee 
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The graph is improper as we have the cycle 0 < 1. We melt node | to node 0. Thus, 
all coalitions of block (0, 1) and (1, 0) are settled and hence are left out from any 
further calculation. They are in a blue block as given below. 

Next one below is proper; we put the block (0, 1) (1, 0) and so with (0, 0) and we 
are starting with the top singleton row 


w 


4 


ml] « 
oS 


me] O piers] 


| 0 


The active to-be-settled coalition is outside the boxed entries, and we have M, > No, 
M, — Na, Ma > N2, M4 — N3 giving the graph below since 0 and | are identified 


as same. We have 


:|0,1 2 3 4 
P ae ae 


l(p) = (0221) = ¢t=(0221) s =-+t. With 6 = 5, at least one worst hit 
active coalitions * (here, say seller-buyer 5) reach the level of the second worst hit 
coalition in terms of satisfaction denoted by ©: (seller 3 buyer 3) and also (seller 2 
buyer 3). However, at B = 4 (seller 0 buyer 3) who was also the worst hit goes above 
the second worst hit coalition’s satisfaction. We get the new uw, v and satisfaction 
matrix at“, v as 


oO Fs | Y 
era or 
54) 3 3[0]1*| 6 
9li951<« = 

24/21 2 1* [o] 2 
43/25 3 1* 1*|0| 


The worst hit coalitions are seller 2 buyer 4, seller 3, buyer 3, seller 4, buyer 3 
and seller 4, buyer 4. Thus, we have Mz — N3, M3 > No, M4 > No, Ma > N3 
giving the graph 


p: \Ol — 3 <— 4 
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Since 2 <> 3, node 3 is melted with 2 keeping all arcs. After this the graph is 
proper. The block Mz — N3 and M3 — N> gets settled. After eliminating the above 
cycle, we get the following proper graph: 


ge. ae 


p: |0Ol 23|/<—4 


ia 0 21 


Thus /(p) = (0,2, 1) for p = [0 1], p = [2 3], = 4. The step size B = < will 
be used with t = (0, 2, 1) ands = -+t. 


=) 


Updating with new u, v and satisfaction fj;(u, v), we get 


aus «0 12 3 4 
“9 foloi #2: 
1) let Fo ae tb 
week ts 
3||8 2 311 (olf 
4| |13 13 17 7-7 
3 6 6 6 6 


The least happy coalitional satisfaction is Z The graph for this is 


p:|O1} <—|23}| 4 


We now have a cycle [0, 1] —> 4 —> [2 3] —> [0 1]. Thus all are melted to 
0. The algorithm stops with iteration 3 and a? = z. The final matrix containing the 
nucleolus is the one in the first column and top row of this matrix above 


_ (13 31 13 13 
FEN GGG 3 


for real sellers 1,2, 3,4 and v = (0, to. 2. a) for real buyers 1, 2, 3, 4, 5. 
We illustrate with one more example. 
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Example 10 


Find the nucleolus for an assignment game with 3 sellers and 3 buyers with the 
coalitional worth of seller-buyer pair given by 


677 
056 
25 8 


where the rows represent sellers and columns represent buyers. The nucleolus 
is determined by the lexicographic ranking of the satisfactions of singleton and 
buyer-seller coalitions arranged in a non-decreasing order. Since the singleton play- 
ers can also be considered as a buyer-seller coalition with a dummy seller 0 or dummy 
buyer 0, the problem is treated as simply a buyer-seller satisfaction in the 4 players 
by adjoining dummy seller and dummy buyer. The augmented matrix is 


buyer 
buyer j= fi 7 8 8 
seller i= od 
0'0 0 0 
seller | 0 ; 6 7 te (aij) 
2 0:0 5 
3 0:2 


aan 
D> 


= coalitional worth matrix. The optimal assignment in this case is 


620 131959353 fees 1 S012 3. 


We initiate the algorithm with all coalitional worth given away to the seller and noth- 
ing to the buyer. Let (u’, v’) be the u-best v-worst imputation (0, 6, 5,8; 0,0, 0, 0). 
We can write down the satisfaction matrix at the above-given imputation as fj;(u, v) 
(Here u = (0, 6,5, 8)", v = (0, 0, 0, 0)) given by 


The most unhappy coalitions are (1, 2), (1, 3) and (2, 3) (denoted by *). The next 
worst after the most unhappy coalitions are the pairs (0, 1), (0, 2), (0, 3) among all 
these unsettled coalitions (denoted by ©). The equivalence classes are 


p: Jo] 1 2 3 
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and the induced graph with arcs are formed by M; > N2 M; — N3 M2 —> N3 and 
the graph is 


We get the augmented satisfaction matrix with augmented s, t values as 


oS 


o| o o o | go 
6 [6| =1* =1*| 0 
5 5 fo] -1*|] -1 
8 6 3 Jo|| 2 


t OO O 1 2 


Let us find a B such that for (u, v) + B(s, t) for at least one « entry matches a © 
entry. The satisfaction matrix is 


0 0 B 2B 
6 0] -1+ 8 -14+28 
5-p 5=6 [0| -t+e 

8-26 6-28 3-8 0 


For £ = 1, the + entry (i = 1, 7 = 2) becomes 0, while the © entry (i = 0, 7 = 1) is 
still 0, and we have the updated matrix 


S 


0; O* 1 2 0 
6. (0) Of 1] 
4 4 |0| o* | -2 
6 4 2 [ol] -3 
nie 2 23 


while same * entries like i = 0 j = 2 became passive and next in line the entries 
@=0, j=1),@=1, j = 2), @ = 2, j = 3) are the worst hit ones and we form 
the graph 


0] 1% 3 
es f= ©; 1,2, 3:6 =7',- 
ie © 02 sy ee 


Again with this new (u’.v’), we look for suitable 6. Let us write down the satis- 
faction updates with 6. We get 
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0 B 1+26 2436 
6—-B 0 R 1238 
4-28 4-B 0 B 
6-36 4-28 2-B 0 


Here f32 = 2 — £ is the first to reach the increasing minimum level 0 + 6 at 6 = 1. 
Thus we get the updated matrix as 


Ot 3 3s 0 
5 |0| 1* 3 =| 
2° 3 JO] —2 
3 2° 1* [0 —2 


The graph for the matrix is p: |0|/—> 1 —2<3. 

Let us update with a suitable 8 to be determined. We have 
0 (1+)" 3+26 5+28 
5-£ 0; (1+8)° 3+8 
0 1 


We find 2 — 26 | and 1+ 8 ¢ and they reach equally for 6 = i Thus the next 
one is 


j=) 
Is 


ols 
U2] G2] 00) —) 


x 


WB, 
win 


The graph: p: 0 


It is improper and we can melt nodes 1, [2 3] in to 0 and the algorithm terminates. 
The nucleolus is simply the first column and top row given by 


~-(o. 44 4\anave(o 4 12 
ceed ae ea te © ea ee tae ae ak we 
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4.6 Domination Power Between Married Couple 


Suppose we have 3 boys and 3 girls. Each boy can show interest to marry some of 
the girls. Each girl can also show interest to marry someone among these boys. Let 
A, B, C be boys and D, E, F be girls. A marriage could take place if both sides like 
each other as mates. Let a;; = 1 if boy i and girl 7 like each other and let aj; = 0 
otherwise. Suppose for our case let 


(D) (E) (F) 
(A)}|]1) 1 O 
(B)} 0 |1} 1 
(C)}| 0 0 ]1 


represent mate possibilities. Clearly, AD, BE, CF coalitions give the maximum 
number of marriages. Of course, if we view the entries as total power available, we 
can view any core allocation (which exists as we consider only mates of the opposite 
sex only) expressing a split of relative strength between a married boy with his mate. 

Assumption: The nucleolus split of 1 unit between married couple expresses their 
relative domination power. 

We can show that the nucleolus split is 3 and ; between A and D, the split is ; 
and 5 between B and E and i and ; between C and F. Our initial intuition that C 
and D will have the least power is also confirmed by the nucleolus solution. 

We can also represent the current situation via the following graph. Let each 
married couple be represented as a vertex and an edge between 2 couples is introduced 
if the boy of the first couple could also marry the girl of the second couple. 


©)_@)_@ 


We can think of this flow as a chain reaction that would result if one couple were 
to separate and both tried to find other mates. As each new marriage would break up 
another one, we can see the nucleolus values for the men decrease by a constant rate 
as we move along the path: 3, 5. ie respectively. 

A second example: Let 7 boys and 7 girls induce the following marriage market 
where rows represent boys and columns represent girls; 


1100000 
0110000 
0010000 
A=|]1001100 
0000110 
0000011 
0000011 
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As in the earlier algorithm, the so-called settled coalition (i i’): i=1,2...7 is 
treated as the vertex of a graph. Each off diagonal | is represented by a directed 
edge. The graph we get is the following: 


<— 


we ner 
WOAH UNE A 


We melt any cycle into a single vertex. Given such an acyclic graph, we introduce a 
dummy vertex called the source and another dummy vertex called the sink. We call 
them 0 and 8. If a vertex is not having any incoming edges, then we insert an edge 
from source, to this vertex by an edge. Any vertex with no outgoing edge from this 
vertex will receive an edge to sink 8. This is shown below 


Now we have to find the longest path in this digraph. An algorithm like that of 
Noltmeier [22] can be used. To perform this algorithm, we first build a tree using 
depth-first search, starting at the source and going as far as possible, say we go via 
0— 1 > 2 —> 3 —> 8. We backtrack till we find another vertex. Here, we 
must go all the way back to 0 to go to 44 —> 5 —~> (67). Now that we have traveled 
along all vertices, we have an initial tree. We look for any edge we can add to create a 
longer path. Checking with the edge (67) to 8 we have no way to increase the length of 
the path. We retrace and find if we can we replace edge 0 > | by 4 — 1 and we can 
increase the length of the path. These two stages are shown in the following 2 graphs. 
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0 0 
aN os 
1¢---4 1< 4 
{ ui 1 1 
2 5 2 5 
J 1 + ci 
3 (6,7) 3. (6,7) 
Ny x bal 
yy 8 


We have found the longest path as0 > 4 > 1 ~2 —3 — 8. Along this 
path from source to sink, we define a value w for each vertex going from 1 to 0 and 
decreasing uniformly. In this case, the uniform rate is i . These vertices are considered 
settled in that their values remain fixed for the remainder of the algorithm. We will 
show later that these values are the nucleolus values for boys. We then find another 
path to settle, a path which starts and ends with settled vertices with everything in 
between unsettled. We have only one such path 4, 5, (67), 8. Since the end points 4 
and 8 have values 4 and 0, the uniform rate is 3(3 -O= i. This way all the vertices 
are settled and we have the nucleolus value for each player. Every boy receives the 
w value and the mates 1 — w value. Since (67) are collapsed into a single vertex, 
those pairs will receive the same split. 

The nucleolus solution : Boys Girls 

32148 4 4 ee ee S| 

Hardwick in his Ph.D. thesis [14] considers the special class of assignment games 
where each two-person coalition alone matters and they are worth either 1 or 0. 
The algorithm has two specified methods to compute the nucleolus. The first is a 
further specialization of the graph theoretic algorithm of Solymosi and Raghavan 
[29] which itself is a specialization of Maschler-Peleg-Shapley’s iterative algorithm 
[20] in arriving at the lexicographic geometric center. The critical step is to find 
the longest paths ending in each vertex. The algorithm of Noltmeier [22] is quite 
essential. 

Hardwick’s second method is much simpler to implement. At each step, we find 
the longest path between two settled vertices; the path with the smallest average 
satisfaction is settled as our average gives us max min 6 value. This follows from the 
following theoretical argument. 

It is known that in any maximal matching with (i i’), the players i and i’ split the 
worth of their coalition exactly, i.e. w(i) + w(i') = aj;. If we assume full matching 
then a;; = 1Vi and so w(i) = 1 — w@’) or w(i') = 1 — wi). 

We know for assignment games only one- and two-person coalitions are essential. 
In fact, we show later only coalitions that correspond to edges of our graph will be 
essential. For example, if (i, j’) is the coalition, the satisfaction at any imputation 
w is sij(w) = w(i) + w(j’) — 1 when (i /) is an edge in G. It is w(i) — w(j) as 
(j j’) is part of an optimal assignment. 
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Let us think of w(i) — w(j) = 5;;(w) as the length of the edge. If w is acore impu- 
tation then the length L(i, 7) = w(i) — w(/). Thus finding the nucleolus amounts 


to produce the max min values of edges. For any path vo, v;,..., vx, the sum 
of edge lengths = L(vo, vj) + L(y, v2) +--- + L(ve-1, ve) = W(vo) — w(vg). In 
Case Uo, Uj,..-, Ue is a cycle, we will have vg = vo and that the cycle length is 


w(vo) — w(vz,) = 0. However, since the nucleolus is a core allocation, L(i, j) = 
OVi, 7 € V. In any cycle since cycle length is 0, each edge length L(i, /) is also 0, 
ie. wi) = w(J). 

Let G’ be a new graph made by collapsing any cycle in G with a single vertex. 
For any path in G’, the sum of the edge lengths will be the same as in G. 

Now consider the longest possible path in G’ : 0, v1 ..., vg_1, 2 + 1. This path 
contains k edges (here 0,n+1 are dummy vertices). The average length is 
WOK werd | By arbitrarily setting w(0)=1 w(n + 1)=0, the minimum edge length 
< wa and nucleolus should maximize the minimum edge length on this 
longest path. Thus, all edges in this path have the same length i If our assumption 
is indeed true, these values must get fixed. Thus, we can settle each vertex on these 
parts, so that edge lengths will not change. 

To show this formally, it is enough to show that in the algorithm of Solymosi- 
Raghavan the first iteration gives 6; = i 

In each iteration, we define the triple (X, A, &) where X C core imputations. 
Here, A is aset of settled coalitions, namely the satisfaction in any coalition belonging 
to A is a constant over the entire set X. That is 


A= {ei eMxwN > fij(&) = ij (x) = cj Vx E x| 


where M = {0, 1, 2,...,.n} N = {1',2’,...,.n +1} and M x N is the set of all 2 
player coalitions (or one player coalition), since 0 and n+ I’ are dummy play- 
ers. Even though we should consider all coalitions, it is sufficient to consider 
only essential coalitions for which s;;(x) is not a constant over all of X. Namely, 
these are the coalitions outside A. We initiate with the core coalitions X° = {x : 
imputation with 0 < x(i) < 1 and where E = edge set = {(i, j) : ajj = 1, x(i) — 
x(j) = O}. We assume that if besides (i, i’) if (i, 7) is a matched pair where i € M 
JEN, x(j) =1-—-x(i). If X° is the core then x(i) + x(j‘) > aij V i, j’, where 
ie M, j'€N.Thatisx(@)+ U1 —x(j)) = aj. But G, j)€é E = > aj = 1. Thus 
x(i)—x(j) > 0. 

Also on the core when A? = {(i, i’) : E° € M\O}, and X° = M x N\A?% as well 
as 6°, we have for step r = 1, 2, etc. 


6°! = max min_s;;-(x) 
xEX' (i,j Ex" 
XT = {x sx EX": mins; (x) = 8"*"} 
AT =A U{G J) €D" i sy) =8"7} 
ort? = YMG, f) € Ds six) = ST Vx © Xt}. 
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The algorithm terminates with A = M x N, & = ¢,and X consists of a single impu- 
tation, the nucleolus. 

We will show 6! = tr In the first iteration, we choose the longest path from 0 
to (n + 1)’, and we define based on the path length w(0) — w(n + 1’) = 1, wi) = 
1- i where d(i) is the length of vertex i from 0. This particular imputation gives 


for exact pair, the satisfaction 


siz = wi) + w(j’) — aij 
=w(i)-wij) + 1-aj 
_ d(j) — di) 
= k 


+1 aij. 


When a;; = 1, we know (i, j) € EF and that d(j) — d(@i) = 1,1. s;j(w) = 7. When 
aij =9, (i, j) € E, but di) — d(j) < kk —1 VG, j) ¢ M x N, since the length of 
the longest path is k. Thus s;;(w) => w@) — w(j) + 1-— aj = A + 1(when aj; = 
0) => sij(w) = i. 

Thus, we have found an imputation w in X° for which : Tn j(w) = i But 


the longest path average edge length = t and so the min edge length of this path 


I 
oer Thus, we get 


1 
max min sj;j(x) = —. 
xEX? (i, j JEL? k 


Thus, all the vertices along the maximal path are settled and for any x € X° 
all those edges will satisfy s;;(x) = , along that path. We now adjoin the path P 
with A° and let A’ = A? U P and &’ = &°\P and set X’ = {x € X°: si (x) = 
i VG, Jj) € P and s;;(x) = : V(i, j) ¢ P}. But this one is achievable if for all i 
with (7, j) € P, we fix x(i) = w(i). We can define Vp = {i : (i, 7) € P} and thus 
X' = {x € X°: x(i) = w(i) Vi € Vp, x(i) < wi) Vi ¢ Vp}. Instead of just fixing 
satisfactions for coalitions, we are actually fixing payoff for each couple along the 
path. 

Let us assume at some iteration r, we have defined (X”, A’, &”) and also 6”. 
With it, we define an imputation w’ which is the southwest corner of X” giving the 
highest possible value of x (7) for alli € M andx € X”. Suppose we have also defined 
aforest F’ C &" with the following properties: for all (@, j) € F’ s;j;,(w") = 6", and 
for all j with (i, 7) € &” there is actually an edge (i, 7) € F” for exactly one i. (We 
can check that F’ U P is the tree constructed in the first iteration by the Noltmeier 
algorithm). 

Here we will show how we can apply Hardwick’s algorithm to solve the following 
14 x 14 assignment game. Here rows represent males and columns represent females 
and a;; = 1 when male i and female j like each other, a;; = 0 otherwise. 
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11000001000000 
01100000000000 
00110010000000 
01011000000000 
00001100000000 
00000100000000 
00000010000000 
00000001100000 
00000010110000 
00000000011000 
00000000001100 
00001000000100 
0000000000001 1 
O0D000001000001 


The graph is constructed as follows. We create one vertex for each row/column, 
plus an extra two dummy vertices, which we call 0 andn + 1. Since there are 14 rows 
and columns in this example, we will name the vertices as 0 through 15. Next, foreach 
off diagonal 1, we create a directed edge from the row number to the column number. 
We also create edges from 0 to any vertex whose corresponding column index has 
a | only in the diagonal position, and from any vertex whose corresponding column 
index has a | only in the diagonal position, and from any vertex whose row index has 
a | only in the diagonal position to n + 1. We do not bother creating dummy edges 
for the other vertices, as they will be irrelevant since we know there will be a longer 
path on the graph. The resulting graph for this example is found in Fig. 4.2. 

Next, we use Noltemeier’s algorithm to build a tree, initially using a depth first 
search on the graph, resulting in Fig. 4.4, then finally at 4.4. Solid edges represent 
edges in the tree, and dashed edges represent those in graph, but not the tree. 

By deleting the edges along the longest path from 0 to 15, but keeping the rest 
we generate a forest with settled vertices as the base for each tree in the forest. The 
initial path had 0 and 15 as the only settled vertices and w’ was the equal split for the 
path length. Thus, we work on each tree of the forest for similar settled end pairs, 
with unsettled vertices in between. We can choose the path with the smallest average 
satisfaction settled as that average gives us the next max min 6 value. The new graph 
is (settled vertices circled) 
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@O 


AS 


We have paths 0 — | — 8 with ends as settled vertices 0 and 8. 
We have path 0 — | — (234) — 5 with ends as settled vertices 0, 5. 
We have path O — | — (234) > 7 — 15, with ends as 0, 15 as settled vertices. 


des ale 
The 5 value for 0 > 1 —> 8 is wes = a = x. 
The 5 value for0 > 1 > (234) > Sis wow) = — = 
The 5 value for0 > 1 > (234) 37> I5is 48 =f. 


6 
The 6 value for 9 > 7 > 15 is ce — a. 


The least value 6 is x 
Thus, the next step is to settle vertex 1 with w(1) = 1 — x = x. 

Notice that in each step we build no more than n trees giving O(n*) calculation 
per each step as Noltemeier [22] algorithm uses O(n’) calculations for each one. 
Since at least one vertex is settled in each step, the total number of steps is < n. Thus 
we have O(n*) calculations. 

We will solve our 14 x 14 game! with its attached graph at the end of the first iter- 
ation using Noltemier’s algorithm. The longest path isO0 — 13 > 14> 8 > 9 > 
10> 11> 12 +5> 6-15 giving wO)=1, w3)=3, wil4)=4, 
w%=%, wH=4, wIHD=%, wAD=4, wID=%4, wH=7A, 
w(6) = i w(15) = 0. Since all these vertices are settled, we consider the induced 
forest by deleting the edges connecting any two settled vertices. We now remove the 


edges joining | and 8 and keeping the other paths as below 


' These are found in page 23, 24, 25 and page 33 of John Hardwik’s Ph.D. thesis [14]. 
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I| 
sis 


ie 
~N 


—_ 


CHCREIORORCECREIERO 


4-0 13 
1+ @,3,4) > Sgivesd = 2 = @. 


17 
1 > (2,3,4) > 7—> 15 gives 6 = BS = 1. 
These are the paths ending in settled a The ae iso 7 attained for the path 1 > 
(2,3,4) ~ 7—> 15. Thus w(2, 3, oo x = = =. Noe we are left with just 
one path (2, 3,4) — 7 — 4 and 3 — i = w(7). ae a ebieuees nucleolus 
for all —— Itis wO)=1, wO)= ro w(2) = a w(3) = ao w(4) = 
B WS)= i WO= qq, WO =F. wE)= 7H wO)= jp. w(10) = 


2, wdl)=4, w2)= 3, w3)=3, w4)=%, ws) =0 Fig. 4.2). 


4.6.1 The Kernel/Nucleolus of a Standard Tree Game [13] 


Let T be a rooted tree with root o and with exactly one edge emanating from o. 
Let V be the vertex set and E be the directed edge set. If N is the player set, then 
each player occupies exactly one vertex and |V| < |N|. Players occupying the same 
vertex are called neighbors. The root vertex o has no occupant and is called a public 
vertex. Every other vertex v € V is occupied by at least one player. Each edge e € E 
carries with it a cost c(e) > 0. 
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Fig. 4.2. We construct a tree and then iteratively update it to find the maximum depth of each vertex, 


and to collapse any cycles 
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Fig. 4.3, An example for 
standard tree game 


Fig. 4.4 An example of 
standard tree game 2 4 2 


The construction of underground water pipes to supply Lake Michigan water to 
villages in Cook and Du Page countries in Illinois state can be modeled as a standard 
tree game. 

Sharing the cost of runway construction at an airport by airline companies pos- 
sessing planes of varying lengths and volumes can be modeled as a tree enterprise 
which is also a chain. 

The key observations are (1) The TU game is a cost game where for each coalition 
S, the coalitioned cost c(S) is the minimal cost needed to join all members of S to 
the root o via a connected subgraph of (V, E). (2) The kernel is also the nucleolus. 
(3) The game is convex in the sense of [28]. 

The algorithm of [13] initially computes the protonucleolus, an approximation 
that is refined by remaining bad arcs one by one till one arrives at the nucleolus. An 
arc € = (v;, vj) is bad when x; > x; for the protonucleolus x (Figs. 4.3 and 4.4). 


Example (Granot et al. [13]) 


All players in a vertex are charged the same amount. Let du, = # residents at vz + 
# edges that leave vz. Thus dv; = 4, dv2 = 1, dv3 =3, dvg=2, dvs = 1. If 
é€ = (vp, Ug), and if x, = protonucleolus at v;,, then protonucleolus at vy = [x,_, + 
c(e)]/dvz. We can see (3, 3, 4, 2, 4, 7) as the protonucleolus (here x,, = 0). Here, 
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player 4 who is further away from the root compared to players | and 2 is charged 
less and (v1, v3) is a bad arc. 


A chain example (Figure 4.5) 


For this chain example, x; = 1, x2 = x3 = 5/3, x4 = x5 = x6 = 11/9. Thus (v2, v3) 
is abad edge. Trim it and add the cost with (v,, v2) cost and put 4, 5, 6 also with 2, 3 in 
v2. Trimming bad edges again and again gives the nucleolus as (1,1.4,1.4,1.4,1.4,1.4). 
This corresponds to the final trimmed chain (Fig. 4.5). 


Fig. 4.5 A chain example 


4.6.2 Balanced Connected Games [30] 


A common feature among routing, sequencing and scheduling situations modeled 
as TU games share an added feature that all the essential coalitions (breaking 
them up are disadvantages to the players) are connected with respect to a fixed 
order of the players. These games are often balanced but connected coalitions 
could still be of formidable size. If I is a minimal balanced collection among 
essential coalitions, then I" is just a proper interval partition of N [10]. Like 
the southwest corner extreme vertex in assignment games, we have the lex min 
vertex x of the core C = {x : x € R” : x(S) => v(S), V intervals S and singletons} 
defined recursively by X; = min{x, : x €C}, %. = {x.:x €C,x, =Xj},..., 2, = 
min{x, :x €C, x; = X1,X2 = Xo, Xp—-1 = Xp-1}. Here X; = max{v[i j] —x[i fj — 
1]: 1<i < j}. Thus X[1 j] is the value of the maximal interval partition of coali- 
tion [i j]. As in assignment games, one can apply the Maschler-Peleg-Shapely 
algorithm to the collection of interval coalitions partitioned into A° = {N} and 
hae = {Set of all interval coalitions} — {N}. Each time some element J of x is 
removed, we start with A! = {N, 1 }, a = hu —I, on the narrowed subset of the 
core and reach its lex min vertex. The algorithm terminates in O(n*), operations with 
the nucleolus. 
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4.6.3 Nucleolus for Cyclic Permutation Games [31, 33] 


Suppose there are n customers forming a queue in front of a counter. Each customer 
needs exactly the same amount of service time. Due to their professional urgencies, 
the waiting time costs could vary from person to person. If side payments are allowed, 
any coalition S' of customers can permute among themselves their mutual positions 
without affecting the waiting times of each member of complement coalition S°. Let 
aj; = 0 be the gain to player i swapping his/her position with player j due to side 
payment. We have a TU game v(S) = max )° ajo) (here o is any S-permutation). 

By the duality theorem, an optimal permutation in Il of N = {1,2,...,n} will 
give a pair of n-vectors u > 0, v > 0 with Dt = {(u, v): u => 0,v > 0, Ajj — Ui — 
vj= 0, V(i, J) ell and aj; -uj-VUs 0, VC, J) ¢ TT}. 

The game has a non-empty core [33]. Further the coreC = m(D) where m(u, v) = 
u+ v (see [8, 24]). 

Here D* is the core of the assignment game induced by the matrix A. 

Unfortunately, this embedding cannot help to recover the nucleolus of the permu- 
tation game from the nucleolus of the above assignment game with matrix A. While 
the essential coalitions are of size O(n”) for assignment games, they could be of order 
O(2") for general permutation games. In general, the grand condition is inessential 
for assignment games, while, say for cyclic permutation games (games where v(NV)) = 
ain where [1 is a cyclic permutation), the grand coalition is essential. Optimal 

L 


permutation IT defines as intervals [i j] where [i 7] = {i, N@), I’), WG) = T} 
and [i j] = {i} when i = j. We can for any permutation o which maps i to j geta 
cyclic [j i] permutation if 0 (h) = I1(A) forh € [j i] — {i}. 

The assignment game algorithm [29] can be used to locate the nucleolus for 
cyclic permutation games, but we should start with only the least satisfied coalitions 
of the buyer-seller type. If we made the main diagonal as the optimal assignment by 
swapping columns, we should re-swap the columns to their initial positions and use 
the lexicographic u, v solution. The nucleolus for the permutations game will be the 
vector m(u, v) =u+v. 

For the cyclic permutation game A with II(1) = 2, 12) = 3, 1G) =1, A= 

2@ 4 

0 1 @j]. The nucleolus is (9/2, 2, 11/2) based on pair nucleolus (4, 2, 9/2; 

® 2 
1/2, 0, 1), while the nucleolus for the assignment game is (3, 1, 3; 2, 1, 2). Allit gives 
when merging u, v by u + v is only the core allocation (5, 2, 5) for the permutation 
game. 


Remark 4.9 Consider the permutation game with matrix A = 


For this game, the image under the map m of the pair nucleolus (8, 2, 2, 2; 0, 0, 0, 2) 
gives only the core element x* = u + v = (8, 2, 2, 4). However, the nucleolus for 
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this permutation game is (23/3, 7/3, 7/3, 11/3). We can easily check that the second 
imputation is lexicographically better than x*. 

It is an open problem to find an efficient algorithm for the nucleolus of non-cyclic 
permutation games. 


4.6.4 Brief Remarks on Some References Included 


Suzuki and Nakayama [32]: They model water resource development as a coop- 
erative venture and propose the nucleolus solution as a fair solution to venturing 
participants with capital costs and reaping benefits. 


Bird [5]: Every game generated from a minimum cost spanning tree with an immov- 
able source admits a non-empty core. 


Huberman [15]: A coalition S is essential if and only if v(S) > >> v(T) where TI is 
Tell 
any non-trivial partition of S. Any TU game v, N with non-empty core is determined 


just by the essential coalitions. 


Muto et al. [21]: In a market with n persons, if only one among them possesses the 
patent for a new product, the game admits a non-empty core and a stable set. 


Arin and Feltkamp [1]: A game (v, NV) is veto-rich when one player is critical for 
all positive coalitional worth. The nucleolus can be computed avoiding any linear 
program. When all players other than the veto-rich player are homogeneous, this 
is the same as the Ricardean model considered by Chetty et al. [7]. They give an 
explicit formula for the nucleolus. 


Branzei et al. [6]: Here, they introduce the notion of strongly essential coalitions. 
In any game, the coalition of strongly essential coalitions contains all the coalitions 
which determine the core and when the core is non-empty, the nucleolus. They con- 
sider a special class of games called peer group games. They have at most 2n — 1 
strongly essential coalitions. However, for the same class, essential coalitions could 
be as much as 2”~!. Here, they compute the nucleolus in O(n”) time directly from 
the data. 


Kleppe et al. [16]: Here, they study weighted nucleolus and per capital nucleolus. 
They show modifications needed to approximate weighted nucleoli. 
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5.1 Introduction 


In the modern world, computed tomography (CT) is one of the most successful and 
indispensable diagnostic tools in the domain of medical imaging. There are many 
reports worldwide, which directly/indirectly reflect the success story of the increase 
in number of the annual CT examinations Geyer et al. [39]. In the early 1970s, the 
first tomographic imaging modality Ambrose [2], Hounsfield [49] was introduced; 
since then, the speed of rapid development of hardware of CT machines is amazing 
Flohr et al. [37], Hsiao et al. [50], Zhang et al. [105]. The rapid development of CT 
machines since its first clinical scan took place can be imagined by the time taken 
for scanning the patient then and now. The first clinical scan took approximately 
Smin for scanning the patient, whereas the modern machine finishes the rotation 
in approximately a quarter of a second Geyer et al. [39]. Also, significant success 
and development in terms of the speed of the reconstruction algorithm, quality of 
reconstructed images and resolution of reconstructed images. The development in 
CT scanner is not only from scanning/rotation and reconstruction algorithm speed 
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but also in modality as a whole, such as dual source, dual energy, detector size and 
many more developments Flohr et al. [37]. 

From a commercial/business point of view, the worldwide increase in the sale of 
CT scanners is a joy for manufacturers and many minor providers of such a niche 
medical imaging diagnostic tool. But at the same time, it has always been a worry 
and problematic situation for the user community due to radiation exposure Abbas 
et al. [1]. The radiation exposure to the users has significantly increased since the 
introduction of CT imaging. Due to such hazardous nature of the CT imaging modal- 
ity, the scientific community is working hard to tackle the health risks involved, and 
such efforts have shown effective and promising results in the significant reduction 
in CT dose. There are many ways to look at CT dose reduction. The first one is to 
reduce CT radiation exposure by using Kalra et al. [57], Slegel et al. [87] technique 
only when the benefits outweigh the risk as well as the cost. The other way to look 
at it is the CT dose reduction by the development of sophisticated algorithms as well 
as the hardware which can produce high-quality reconstructed images. The efforts 
in this direction have shown significant improvements Fleischmann and Boas [36]. 
Multiple dose reduction methods were introduced, including tube current modula- 
tion, organ-specific care, beam shaping filters and optimization of CT parameters, 
and following CT protocols, essential parameters such as tube current (mA), tube 
voltage (kV), pitch voxel size, slice-thickness, reconstruction filters and the num- 
ber of rotations Kalra et al. [51], Slegel et al. [87]. The different combination of 
parameters enables significantly different image qualities while delivering the same 
radiation dose to the patient. In the clinical routine, radiation exposure is frequently 
controlled by adjusting the tube current. Due to the decrease in the tube current, 
there is a proportional increase in the image noise. To counter such compromise 
in the image quality, there is a need for continuous development in the domain of 
the reconstruction filters and different algorithms, which controls the quality of the 
reconstructed images Brenner and Hall [11], Murphy et al. [73]. 

In this article, we will navigate through different reconstruction algorithms since 
the first CT reconstruction algorithm was introduced Willemink and Noel [97]. Our 
aim is to analyze the CT reconstruction algorithms with an emphasis on the sampling 
of projection data. Broadly speaking, there are two categories of CT reconstruction 
algorithms, one is Filtered Back Projection (FBP) and the other category is dominated 
by iterative reconstruction techniques. The first and original CT image reconstruction 
algorithm was based on the iterative method and is popularly known as the algebraic 
reconstruction technique (ART) Gordon et al. [41]. Due to a lack of computational 
power, ART was quickly replaced by the filtered back projection technique Kak 
and Slaney [56]. The reconstruction speed in terms of computational complexity 
was the major factor for FBP to be widely used. In fact for decades, FBP enjoyed 
being the only choice for commercial CT scanners Pan et al. [79]. In 2009, the first 
iterative reconstruction technique was clinically introduced Hsieh [52]. Since then 
all major CT vendors introduced iterative algorithms for clinical use Nuyts et al. [78]. 
Interestingly, there was a big overlap between the hype of machine learning/artificial 
intelligence and iterative algorithms for CT image reconstruction. While navigating 
through the development of iterative reconstruction algorithms, subsequently our 
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aim is to understand the direction of CT image reconstruction algorithms in the 
domain of artificial intelligence. Our belief raises a big question mark, have we done 
enough to develop a CT image reconstruction algorithm following the non-uniform 
sampling? The reason to pose this question is meant to provoke mathematicians 
and CT image reconstruction experts to consider closer collaboration with engineers 
who design tomographic machines. The scientific/research and design community 
of tomographic systems have to keep the central objective of dose reduction as the 
main focus, otherwise argument for the replacement of filtered back projection will 
be just an academic discussion without product development reality. 


5.2 From Filtered Back Projection to Iterative 
Reconstruction Technique to Clinical Necessity 


In December 1970, Gorden et al. presented the first iterative reconstruction algorithm 
still popularly known as Algebraic Reconstruction Technique (ART), which was ini- 
tially applied to reconstruct cross-sectional images. It was practically impossible to 
perform image reconstruction in a reasonable time using the ART algorithm, and 
the main reason was a lack of computational power. In 1972, Hounsfield’s invented 
an X-ray computed tomographic scanner for which he received the Nobel prize 
along with Allan Cormack who independently discovered some of the algorithms 
Hounsfield [49]. Initially, the convolution back projection and then filtered back 
projection algorithms took off the pace mainly with the advantage of simplicity 
and reconstruction speed. Hence for a long time, the algorithm used for CT image 
reconstruction in commercial scanners was filtered back projection and then largely 
modified version of FBP algorithm, which was developed originally for approximate 
image reconstruction from circular cone beam data by Feldkamp, Davis and Kress 
(FDK) in 1984 Feldkamp et al. [34]. As far as the diagnostic image quality of CT 
reconstructed images is concerned, Feldkamp et al.’s algorithm played a major role 
with certain minor changes over the years. On the other side, the biggest disadvan- 
tage of FDK’s algorithm is the direct proportional relation between image noise and 
radiation exposure as well as following the Nyquist sampling rate for the acquisi- 
tion of the projection data for perfect reconstruction Herman [48], Mersereau and 
Oppenheim [69]. 

While commercial scanners operated with Feldkamp et al.’s algorithm, the CT 
image reconstruction’s scientific community invested significant efforts and time into 
the development and advancement of the iterative algorithms Cheng et al. [23], Hara 
et al. [45], Lin et al. [65], Yu et al. [102]. For such advancement, the main focus was 
to enable low-dose CT algorithms with high diagnostic image quality. 

Broadly speaking, these advancements will fall mainly into two categories: (i) 
iterative algorithms and (ii) hybrid algorithms. The different algorithmic approaches 
in these categories mainly focused on dose reduction with the improvement in the 
diagnostic image quality and taking care of computational complexity as well. The 
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iterative algorithms mainly solve the linear system of equations with a combina- 
tion of data terms along with a regularization term (or prior term). The data term 
deals with the observed projection data, whereas the regularization term deals with 
the non-uniformities of the CT system. There are different statistical based iteration 
algorithms such as maximum likelihood, least squares and maximum a posteriori 
estimates. Such methods provide great flexibility in dealing with noise and artifacts. 
In the case of hybrid algorithms, both the analytical and iterative methods are com- 
bined. Mainly analytical methods are used to generate the initial images and then as 
a second step iterative methods are used to optimize the image characteristic B and 
Veo [5], Leipsic et al. [63], Thibault et al. [90], Winklehuer et al. [98]. 

The main progress in the computational complexity has been achieved by using 
Graphical Processing Units (GPUs) for accelerated CT image reconstruction algo- 
rithms. GPUs and high processing computational power allowed the successful 
implementation of sequential algorithms into parallel processing algorithms and such 
advancement in the implementation of algorithms played a major trigger for the med- 
ical device industry to try alternative algorithms Eklund et al. [32], Kim et al. [59], 
Yan et al. [99]. 

In 2009, the first iterative reconstruction-based algorithm (Iterative Reconstruc- 
tion in Image Space IRIS, Siemens Healthineers) received FDA approval Grant and 
Flohr [42]. In a way, the algorithm is a hybrid of FBP and IR. In this algorithm, 
only a single backward projection step is applied for the creation of a cross-sectional 
image, using the raw data. The reconstructed image quality and other details can be 
seen in Gordic et al. [40], Moscariello et al. [72]. 


5.3. Reconstruction Algorithms as Inverse Problems 


The CT image reconstruction process can be modeled as the following linear system 
of equations: 


y= Ax (5.1) 


where y denotes the raw data vector (say of size M) that expands the data function, x is 
the reconstructed image vector (say of size NV) that expands the under image function 
and A is systems matrix of size M x N, which models the discrete X-ray transform. 
The system matrix depends upon the choice of basis functions for expanding data 
and image functions. There are different techniques for computing the discrete X-ray 
transform, in other words the estimation of matrix A plays a vital role and at the same 
time has many different techniques. The primitive and intuitive way to estimate the 
elements of the system matrix A is by calculating the intersection length of an X-ray 
with an image voxel. The measured data y denotes either simulated data or real data 
for the development of the reconstruction algorithms. The complete details can be 
found in Kak and Slaney [56]. 
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Consider the following equation: 
D(x) = |Ax — y|/M (5.2) 


i.e., the average Euclidean data divergence per detector pixel between the measured 
data and imaging model. The data divergence term, i.e., D(x) can be zero only in 
a simulation study in which ‘data’ is generated by the use of an imaging model. 
This is one way to generate the simulated data, whereas another way to simulate 
the data is by estimating the line integral of a phantom. The data divergence term in 
Eq.5.2 will necessarily be non-zero in two scenarios, first one is when data vector 
y is a real data set or it is simulated using the line integral of the phantom. The 
standard techniques such as SVD for analyzing the properties of systems matrix A 
are unfortunately impractical because of the sheer size of the matrix, which in general 
is of size 10'° x 10!° for a typical cone beam CT-image reconstruction problem Han 
et al. [44]. 

It was Gorden et al. in 1970 who presented the first iterative algorithm ART to 
solve such a linear system of equations, which is modeled as a CT-image reconstruc- 
tion problem. After that, Simultaneous Algebraic Reconstruction Technique (SART) 
was developed, which was a modified version of ART in terms of the reduced com- 
putational cost Anderson and Kak [3]. After these two algorithms, there is a chain 
of optimization-based CT-image reconstruction algorithms. For the development of 
such algorithms, the design of objective function and optimization strategy consti- 
tutes the critical step in which the addition of data constraints and prior conditions 
are also devised and included Han et al. [44]. 

In this section, we would like to set up a base of optimization algorithms used for 
CT image reconstruction. Once this base is set, then it will become easy to connect 
the concept of optimization-based algorithms with the concept of compressed sens- 
ing. The main reason to understand this connection is because compressed sensing 
provides a very sophisticated mathematical framework for exact signal recovery from 
undersampled data measurements. In the case of CT image reconstruction, under- 
sampled data measurement can be directly related to less number of projections, 
which probably can be translated in terms of dose reduction. 

Coming back to optimization methods, considering Eq. 5.2, data divergence and 
image positivity, the fundamental optimization-based CT image reconstruction algo- 
rithm can be represented as follows Han et al. [44], Jorgensen et al. [55]: 


D(x) <e€ and x; >0 (5.3) 


where € > 0 is a selected parameter to incorporate the inconsistencies between the 
data and the imaging model. x; represents the value of voxel j of reconstructed 
image x and j = 1, 2,..., N. This optimization algorithm will provide many pos- 
sible solutions due to less number of measured data. To deal with such a situation, 
mathematically we can add an image sparsity constrained to Eq. 5.2, ie., 


D(x) <¢€, xj =O and |\x|lo <5 (5.4) 
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where ||x||9 denotes the /o-norm of the image, and s denotes the sparsity parameter, 
which is the number of non-zero voxels imposed on the image. Now, Eq. 5.4 is a non- 
convex optimization problem with much tighter constraints compared to Eq. 5.3. To 
handle such tight constraints, the concept of total variation (TV) has been introduced. 
The use of the total variation technique is very much in practice for different image 
processing techniques. Incorporating the TV constraint in Eq. 5.4, 


D(x) <€, |Ixllrv St, xj 2 and cy(x) <y (5.5) 


where ||x||ry denotes the image TV, whereas fo is a parameter which constrains the 
TV of the reconstructed image. y is a negative number larger than | and cy(x) is 
define as 


dry i daata 
Cy (Xx) = (5.6) 
ldrv||daatal 


where dry = V,||x||rv and dgata = V, D?(x). The V, indicates the gradient oper- 
ator that excluded zero elements of x. 

Equation 5.5 brings non-convexity due to the cy term. Another variation of the 
above algorithm can be written as follows: 


D(x) <€, |Ixllrv Sf, xj 29, [lxllo Ss and ca(Qx) <y (5.7) 


The above method is non-convex due to multiple factors. 

All the methods explained above in a broader sense sum up the optimization 
techniques used for CT image reconstruction. In literature Ritschl et al. [83], Sidky 
and Pan [85], Sidky et al. [86], Tang et al. [89], most of the optimization techniques 
will be variations of the above-mentioned methods. These explained optimization 
techniques for CT image reconstruction look like depending upon the explicit param- 
eters on the right-hand side of the inequalities, i.e., €, 9, s and y, whereas the basis 
set spanning the image and data spaces depends on the choice of system matrix A. 
Although there are a number of research papers on how to choose the system matrix 
A, the authors of this article plan to use it judiciously in such a way that we can 
achieve perfect CT image reconstruction. 

Once the stage is set with different types of optimization techniques, now the 
designing of different algorithms for reconstruction comes into play Stiller [88]. The 
role of these algorithms is to solve the optimization techniques for reconstruction 
as explained above under the specified constraints. To solve such an optimization 
technique, the main technique is gradient descent. In the literature, one can find 
different variations of the gradient descent technique in the form of projection onto 
convex sets (POCS). These POCS methods are further categorized as non-adaptive 
and adaptive POCS methods Boyd and Vandenberghe [10]. 

In the next sections, our aim is to explore the compressed sensing-based tech- 
niques with a focus on understanding the advantages of non-uniform sampling versus 
Nyquist sampling. 
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5.4 Compressed Sensing-Based CT Image Reconstruction 
Algorithms 


The area of compressed sensing was initiated in 2006 by groundbreaking papers 
Candes and Romberg [14], Candes and Tao [15], Candes et al. [16, 17], Donoho 
et al. [29], Donoho and Tsaig [30]. The key idea of compressed sensing is to recover 
a sparse signal from very few linear measurements by using optimization techniques. 

To understand the compressed sensing problem, the aim is to recover the signal 
of interest, i.e., x = (x;)/_, € R". We assume that x is sparse, ie., ||x||o = n0.{i : 
x; 4% 0} very few non-zero coefficients or there exists an orthogonal basis or a frame 
® be the elements of the orthonormal basis or the frames as column vectors. 

Now mathematically the compressed sensing problem can be formulated as fol- 
lows. 


Estimate x from the given system of linear equations 
y= Ax (5.8) 


or estimate c from 
y = A®c (5.9) 


In both cases, the linear system of equations is under-determined with sparsity as a 
priori information. 
The mathematical problem leads to the following questions: 


e What is a suitable sensing matrix? 
e Which signals for recovery are suitable signals? 
e Can we always do an exact recovery of the signal? 


In this section, our aim is to focus on the connection of the compressed sensing 
technique with CT image reconstruction. Since 2006, there has been a lot of research 
done on the theoretical aspect of compressed sensing, and the scientific community 
has tried to solve varied real-life applications. There have been attempts to use com- 
pressed sensing tools and techniques to build CT image reconstruction algorithms 
with varying number of projections. We always focus the new developments in CT 
image reconstruction from the viewpoint of reducing the X-ray dosage and maintain- 
ing the image quality, although it has been a debatable point about the real success in 
solving such real applications using the concept of compressed sensing. The reason 
is to find an appropriate sensing matrix A or A®, or the assumption of following too 
restrictive conditions of exact sparsity for the recovery of signal x. 

Before getting into the details related to the concept of sparsity in the development 
of compressed sensing algorithms, it will be best that we should talk about what we 
do mean by sparsity in the application of CT image reconstruction. Our aim is to 
bring sparsity in terms of less acquisition of projections. By doing so, we actually 
are focusing on fewer projections or in a true sense non-uniform sampling. Mainly, 
there are three different ways to acquire fewer projections. In the standard practice 
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of filtered back projection (FBP) algorithm, data acquisition has to be uniformly 
sampled and also densely sampled. For example, in Cone Beam CT image recon- 
struction, sparsity in data acquisition can be brought in different ways as shown in 
the following Fig. 5.1. 

All the cases shown in Fig. 5.la—c can be considered as fewer acquisitions of pro- 
jections, whereas in Fig.5.1d, we have fewer projections but all sampled uniformly 
and densely. Breast scanning is one such cases where the scanning can be done 
only through limited views [1]. Such schemes where the data acquisition is sparse 
or fewer views have been thoroughly investigated using the concept of Compressed 
sensing. It is much easier to design algorithms with such kinds of sampling patterns, 
but there will always be a challenge for hardware design or to manipulate hardware 
accordingly, especially for diagnostic CT machines. Our aim of this paper allows us 
not to get into the hardware challenges related to sparse sampling. 

While discussing the iterative/compressed sensing-based algorithms for CT image 
reconstruction, we need to prepare the setup for such algorithms. First, we need to 
discretize the image space into pixels or voxels, depending upon whether it’s 2D 
or 3D reconstruction. For simplicity, we consider a 2D CT image reconstruction 
scenario, where image space is discretized into a 2D square array with 256 x 256 
and a detector having 1120 detector pixels. Now, the discretized image model for CT 
image reconstruction can be represented as Eq. 5.8, i.e., a system of linear equations 
y = Ax, where y represents a vector of size M corresponding to the total number of 
measurements, and x is vectorized image of size N. For example, we have discretized 
image space into a 2D square array of size 256 x 256, which means N = 65, 536. 
Also, if the total number of projections acquired is say 10 and each projection is 
of size 1120, then M = 11200. Accordingly, the size of system matrix A will be 
M x N. The geometrical representation of fan beam or cone beam discretization 
can be very well understood in any standard text such as Kak and Slaney [56]. 

The above-explained CT image reconstruction model fits very well with the con- 
cept of compressed sensing. In a compressed sensing scheme, the interest is when 
M << N and the aim is to recover x based on the measurements, i.e., y. If we don’t 
have any assumption on x, then the problem is an ill-posed problem. However, it has 
been recognized that if x is sparse and A is designed appropriately, then x may be 
recovered accurately Devore [27], Haupt and Nowak [47], Indyk [53], Rauhut et al. 
[82], Zhang [107]. There have been lots of research papers which talk about how the 
measurements should be carried, but have limited general theoretical developments. 
Such developments have as a result of statistical analysis Carin et al. [20], Certoft 
et al. [21], Ralasic et al. [81], Yu and Sapiro [101], Zang and Amin [103] and have 
shown promising results in array processing. 

There are two key observations in the theory of compressed sensing for perfect 
reconstruction, which we would like to go into detail. First is the concept of data 
coherence, and the second one is the concept of restricted isometry property (RIP). 
Both these concepts have direct relations on the performance of the compressed 
sensing inversion algorithm for the estimation of x Bourgain et al. [9], Cai et al. 
[13], Candes and Tao [19], Dai and Milenkovic [25], DeVore [28], Nguyen and Shin 
[76], Tropp and Gilbert [94]. The system of linear equations, i.e., y = Ax in most 
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X-Ray source 
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(c) Random Scaning Angle (d) Limited View Scaning Angle 


Fig. 5.1 Four different ways for scanning an object for CT scanner 


literature has noise component as well. In our discussion, we will not consider the 
noise term explicitly. 

Now, let’s talk about the data coherence of A, i.e., the coherence of the columns of 
A places requirements on the needed sparseness accurately (even perfectly) based on 
measuring y. The requirements on the level of sparseness are related to the properties 
of A. For sufficiently sparse x and appropriate choice of A, the components are 
recovered accurately. To achieve the accurate/perfect recovery of the signal x, the 
orthogonal matching pursuit (OMP) algorithm is very popular among the image 
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reconstruction algorithms Foucart and Rauhut [38], Needell and Tropp [74], Tropp 
and Gilbert [94]. In simple ways, the exact recovery of x using OMP if A, i.e., the 
sensing/system matrix and x satisfy the following inequality: 


1 


a 1 
ia (5.10) 


MA 


where S = sparsity of x, and 4 = mutual coherence of column vectors of A and 
is defined as follows. Let Ae C”*™ be a matrix with Jp-normalized columns 


aj,..., ay, Le., ||a;||2 = 1 for all | <i < N. The coherence wz, of the matrix A 
is given as 
a= ea (5.11) 


There are limitations on how small the coherence can be, and these limitations 
depend upon the type of the matrix A. For example, 44 = 0 if and only if the columns 
of A form an orthonormal system, and this happens if and only if A is a unitary matrix 
Foucart and Rauhut [38]. But our focus is on the situation where matrix A €« C”*% 
with m < N,i.e., in the framework of compressive sensing. In such a situation, our 
aim is to find how small the coherence can be. The advantage of small coherence 
implies that columns of sub-matrices of moderate size are well-conditioned. Let As 
denote the matrix formed by the columns of A ¢ C’”*% indexed by the subset S 
of [N]. There are many results which set up the conditions on A along with the 
sparsity on the vector x € C%. One of the initial theorem statements below with 
J,-normalized columns and sparse vector x with condition on jz, being less than 1 
and the invertibility on As. 


Theorem 5.1 Let A € C”*% be a matrix with l,-normalized columns, and let 1 < 
S < N. For all s-sparse vectors x € Cy, 


(1 — ws — D)|Ixl5 < NAxIIZ < + ws — DMI (5.12) 


The above inequality essentially implies that for each set C [N] withcard(S) < s, 
the eigenvalues of the matrix A%As lie in the interval [1 — w(s — 1), 1 + w(s — 1)], 
equivalently, if 4(s — 1) < 1 then A{As is invertible. The detailed proof is explained 
in Foucart and Rauhut [38]. There is a rich theoretical literature available Donoho 
et al. [31], Liu and Temlyakov [66], Wang and Shim [95] which tries to simplify the 
conditions mentioned in Eq. 5.12. To handle such conditions, the concepts of frames 
and tight frames have been used as well. 

The analysis of orthogonal matching pursuit explains the performance of sparse 
recovery algorithms where the coherence is small. There are many theoretical results 
which guarantee the exact recovery of every s-sparse vector x € C% via orthogonal 
matching pursuit and via basis pursuit when the measurement matrix has a coherence 
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Before getting into the details of orthogonal matching pursuit (OMP) and its 
different variations, let’s have some light on another strong and interesting property 
popularly known as Restricted Isometry Property (RIP). RIP facilitates the perfect 
reconstruction and very well fits into the framework of compressive sensing. 

The coherence is a simple and useful measure of the quality of a system/sensing 
matrix and to achieve the desired quality; the lower bound on the coherence, i.e., 
ba > JN —m/mV(N — I) for matrix A € C’”* is one of the major bottlenecks to 
achieve the desired quality of the signal in practical applications. Restricted Isometry 
Property (RIP) also known as the uniform uncertainty principle provides a finer 
measure of the quality for a system/sensing matrix A. Below is the definition of RIP 
Foucart and Rauhut [38]. 


Definition 5.1 (RIP) The sth restricted isometry constant 6, = 6,(A) of a matrix 
A €C”*" is the smallest 5 > 0 such that 


(1—d)[lx13 < Axl < 1+ d)llxl (5.13) 
for all s— sparse vectors x € C’. Equivalently, it is given by 


= max | |A*As|Ia50 (5.14) 
SC[N],card(S)<s 


A satisfies the RIP if 5, is small for reasonably large S. 

The aim is always to find the bounds so that the RIP property is satisfied. Hence, 
just like the coherence, it is important to know how small the sth RIP property constant 
of a matrix A ¢ C”*% can be so that it is relevant in the context of compressive 
sensing. Below is the theorem statement which talks about such a bound. 


Theorem 5.2 For A € C”*™ and2 < S < N, one has 
S 
m>C— (5.15) 


provided N > C,, and 6; < 6,, where the constants c, C and 6, depend on each other. 

In the literature, a detailed discussion and results are available to ease out the 
bounds as well as compare the bounds of coherence. There are also mathematical 
results which try to bridge the gaps between these different bounds. Once the bounds 
are well studied, then there are efforts to construct the matrices satisfying the RIP 
of optimal order Candes [18], Blanchard et al. [7], Zhang [108], Baraniuk et al. [6], 
Tilmann and Pfetsch [91]. There are two ways for the construction of such matrices, 
deterministic and probabilistic. Both approaches have their own set of challenges, 
but the probabilistic approach is more popular than the deterministic approach. 

As mentioned earlier, orthogonal matching pursuit (OMP) provides methods for 
perfect reconstruction of the signal. In a similar fashion, Basis Pursuit (BP) estab- 
lishes the success of the sparse recovery of the signal for the system/sensing matrix 
with a small restricted isometry constant. Below is a theorem which provides a result 
for the success of the sparse recovery. 
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Theorem 5.3 Suppose that the 2sth restricted isometry constant of the matrix A € 
Cc" satisfies 


1 
das < 3 (5.16) 


Then every s-sparse vector x € CN is the unique solution of 


min |[z||1 

zeCN 
subject to 

Az = Ax 


There are other simplified mathematical results, which provide simplified bounds 
and constructing matrices which allow s-sparse recovery whose sth order restricted 
isometry constant is arbitrarily close to 1. 

Now it’s time to talk about two main algorithms OMP and BP, which are the 
theoretical success story for all the concepts established in coherence and RIP. The 
literature has many different variants Foucart and Rauhut [38], Wang et al. [96] of 
these algorithms. These algorithms come under the category of greedy algorithms 
Li et al. [64], Tropp [93]. The pseudo-code described in algorithm 1 explains all 
the steps for orthogonal matching pursuit Foucart and Rauhut [38]. The pseudo- 
code described in algorithm 2 are the steps for implementing basis pursuit as a 
1; minimization problem Blumensath and Davis [8]. There are different ways to 
implement the same Cai and Wang [12], Davis et al. [26], Jain et al. [54], Nin et al. 
[77], Timbshirani [92]. 


Algorithm 1 OMP (A, y) 

Input: A, y 

Result: x; 

Initialization: 797 = y, Aj = ® 

for k= 1,2,...dodo 
Step1. A; = argmaxjga,_, 
Step2. Ay = Ag_1 U {Ax} 
Step3. x; (i € Ay) = argmin,||Aa, — y||2 and x, i ¢ Ay) = 0 
Step4. 3, = Axx 
Step5. r;, <— y — dx 

end for 


Apart from the popular pseudo-code presented in the paper, there are variations 
of orthogonal matching pursuit and basis pursuit algorithms. Gradient pursuit and 
CoSaMP are popular variations for OMP, whereas hard thresholding and soft thresh- 
olding are commonly known implementations of basis pursuit. There are model- 
based implementations as well, such as model-based CoSaMP and model-based 
iterative hard thresholding Pope [80]. 
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Algorithm 2 Iterative Hard Thresholding 
Input: A, y and sparsity level k 
Result: x 
Initialization: <9 = 0 
for i= 1,2,... until stopping criterion is met 
do 
&j = Hy (8-1 + AT (y — A&-1)) 
end for 


The literature on compressive sensing of reconstruction of signals is very well 
established theoretically but as discussed earlier through some well-known theo- 
rems and respective bounds, it’s always a challenging task to construct the system 
matrix. A detailed study about compressive sensing in medical imaging has been 
done in Christian et al. [24]. While going through this detailed review paper, the 
main focus is on total variation-based optimization methods. For example, in CT 
image reconstruction finding out the system matrix is like finding a needle in the 
haystack without any a priori information. Similar is the case with other real-life 
applications. In a nutshell, it’s one of the most challenging tasks to find the system 
matrix. In the next section, the authors aim to discuss the concept of sampling, with 
a special focus on non-uniform sampling Landau [61]. 


5.5 Nyquist Sampling Versus Non-uniform Sampling 


The Nyquist-Shannon sampling theorem establishes a sufficient condition for a sam- 
ple rate that allows a continuous-time signal of finite bandwidth to discretize in such 
a way that no information is lost while reconstructing the signal. The theorem only 
applies to a class of mathematical functions whose bandlimited signal, i.e., support 
of the Fourier transform of those functions is finite. Suppose a function contains no 
frequencies higher than Bhertz, then according to Nyquist-Shannon sampling theo- 
rem, the sufficient sample rate for a guaranteed perfection would be anything larger 
than 2B samples per second, i.e., f; > 2. 

As the Nyquist-Shannon theorem is a sufficient condition, it allows that perfect 
reconstruction may still be possible when the sampling rate criterion is not followed. 
In some cases when the proper sampling criterion is not followed, it’s possible to 
have almost perfect reconstruction by allowing additional constraints. The fidelity of 
these constraints can be verified and quantified using Bochner’s theorem Nemirovsky 
and Shimron [75]. Bochner’s theorem characterizes the Fourier transform of a pos- 
itive finite Borel measure on the real line. From a harmonic analysis point of view, 
Bochner’s theorem asserts that under Fourier transform a continuous positive-definite 
function on a locally compact abelian group corresponds to finite measure on Pon- 
tryagin dual group Feng [35], Mishali and Eldar [70], Zayed [104]. 
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In a similar fashion, H. J. Landau’s work presented in the paper “Necessary density 
conditions for sampling and interpolation of certain entire functions” Landau [61] 
tries to establish the concept of non-uniform sampling. 

The main motivation for this section is to emphasize that the use of concepts of 
compressive sensing or /; minimization problems should be in connection with the 
mathematical and well-established concepts of non-uniform sampling. Especially in 
the domain of medical imaging, the most important thing is not to miss any smallest 
possible information, otherwise it can be a fatal mistake for diagnosis or treatment. 
There are always efforts for the search or design for perfect reconstruction algorithms. 
Hence, there is a need to exploit the mathematical conditions applied to the class 
of functions which can provide a sufficient condition for a non-uniform sampling 
rate for perfect reconstruction Eng [33], Marvast [67], Yaroslavsky et al. [100]. In 
brevity, optimization algorithms or compressive sensing-based algorithms should 
hand in hand go with the mathematical concepts of non-uniform sampling literature. 


5.6 Current and Future Developments 


Machine learning and artificial intelligence are one of the most popular tools and 
techniques which are spreading in every domain and aspect of life like a wildfire. 
There are mainly three types of learning models, supervised, semi-supervised and 
unsupervised learning. There is another very popular technique known as reinforce- 
ment learning. Out of all the different methods developed and established in machine 
learning, currently the most popular and successful are artificial neural network and 
deep learning. The most basic unit of neural network is perceptron, and it can be com- 
pared with the standard statistical method known as regression. A single perceptron 
is capable of doing predictions as well as two-class linear classification. The perfor- 
mance is based on how well the weights are being adjusted and type of the activation 
function. From the basic unit, a complete network is constructed in different layers. 
There are input layers and output layers and in between multiple hidden layers. The 
number of hidden layers mostly depends on the type of application. All variations 
mostly depend upon the choice of the activation function. The performance of these 
models depends upon how well the training has been performed on the network. For 
training, the final calculated output is compared to the desired output by a loss or 
error function. Once training is completed then it should match the testing error for 
a network to be successful for the task to perform in the future. Once the network 
has been trained and tested to satisfaction, then the model is ready for prediction, 
classification or designed task LeCun et al. [62]. 

There have been multiple objectives of artificial neural network (ANN)/deep learn- 
ing (DL) in radiology McBee et al. [68]. However, the most common task can be 
assigned to the following applications: classification, detection, segmentation and so 
on. Classification task in radiology includes concluding whether a lesion is benign 
or malignant Chartrand et al. [22]. The existence of lesions and their location are the 
focus of detection tasks. A promising example is the study by Lakhani et al., who 
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Fig. 5.2. Basic deep learning CT reconstruction workflow 


trained two DNNs (Deep Neural Networks) on the detection of tuberculosis Lakhani 
[60]. The network showed great results with a sensitivity of 97.3% and specificity of 
94.7%. 

The aim of this review paper is to focus on the CT image reconstruction algorithms. 
There are huge efforts going on to develop deep learning-based image reconstruction 
algorithms Zhang and Dong [106] with the objective of improvement of image quality 
in comparison to existing algorithms, also if possible reduction in the X-ray dosage 
Greffier et al. [43], Hashemi et al. [46], Kang et al. [58] (Fig. 5.2). 

The main assumption to using machine learning, especially the deep learning 
method against the classical CT image reconstruction algorithms, is that these stan- 
dard and widely used methods are also prone to be sensitive to noise and incomplete- 
ness of measured data as explained in the previous sections Christian et al. [24]. The 
efforts in the direction of deep learning for CT image reconstruction are to combine 
classical and well-established methods with deep learning models. Nowadays, the 
more focus on deep learning is due to its ability to represent complex data as well 
as end-to-end training capabilities. Hence, combining the classical methods with 
the deep learning model, mainly two approaches are adopted. The first one is post- 
processing and the second one is raw to the reconstructed image. In both approaches, a 
mapping is established. In the post-processing case, a mapping is established between 
the initially reconstructed low-quality image and its high-quality image. In the sec- 
ond approach, a mapping is established between projection data to the reconstructed 
image. Our focus is to explore the CT image reconstruction techniques, hence we 
will only talk about the second approach, i.e., a mapping which is established from 
projection data to reconstructed image using the deep learning model. 

The deep learning model uses the optimization algorithms and unroll the feed- 
forward networks. Then based on the a priori knowledge of the problem determine 
which parameters are best to be learned from the data to obtain the final reconstructed 
image. The main advantage is to have our deep learning architecture more manage- 
able as well as more suitable for less number of parameters to learn Arudt et al. [4], 
Christian et al. [24], McBee et al. [68]. 

No matter which algorithm we use, or which hardware technology we use, our 
main and core objective is to reduce the radiation dose and acquisition time. From 
an algorithmic point of view, this objective can be achieved mainly by reducing the 
number of projections. A major problem of reducing the number of projections is the 
streaking artifacts. These streaking artifacts cover the complete reconstructed image 
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Fig. 5.3. U-Net architecture. Each blue box corresponds to a multi-channel feature map. The number 
of channels is denoted on top of the box. The X-Y size is provided at the lower left edge of the box. 
White boxes represent copied feature maps. The arrow denote the different operations Ronneberger 
et al. [84] 


but then can be considered simple structures in classical reconstruction methods. 
Deep learning model with multi-scale architecture allows to capture such global 
patterns of streaking artifacts. One of the best architectures which provides a better 
way to reduce such artifacts for sparse view CT image reconstruction is popularly 
known as U-Net as shown in Fig.5.3, Chartrand et al. [22], Greffier et al. [43]. There 
are other well-known deep learning architectures, such as ADMM-Net, Primal-Dual 
Net and JSR-Net Christian et al. [24]. 

Recently, deep learning-based models are dominating in medical imaging domain 
as well, but in the authors’ opinion it will be too early to say the success of these 
models. Also, there are plenty of challenges in deep learning models. These chal- 
lenges limit the applications and implementations in clinical practices. Also, for 
supervised learning methods, it’s one of the tasks to obtain the labeled data. In the 
medical imaging domain, privacy issues add up to these challenges. The strict reg- 
ulations also make life difficult for technical experts to come up with algorithms 
whose results can solely be used for decision-making by radiologists Zhang and 
Dong [106]. 
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5.7 Conclusion and Open Question 


In this article, the authors tried to give a brief journey of the computer tomography 
image reconstruction algorithm developments. Apart from all the developments over 
the years and the recent dominance of machine learning techniques and more impor- 
tantly deep learning models, still filtered back-projection algorithm with different 
variations dominates the industrial CT machines for their image reconstruction. The 
diagnostic decisions in the medical domain heavily depend upon the reconstructed 
images from different modalities such as MRI, Ultrasound and CT. Due to legal rea- 
sons as well as the direct impact on life of human beings, it will always be a challenge 
for medical experts to accept the probabilistic models. From a scientific/technical 
perspective, we believe that there should be enough efforts to explore or design such 
models which have mathematical support in terms of proving sufficient conditions 
for perfect reconstruction. 

Can we use the concepts of sparsity and use the a priori knowledge for satisfying 
the conditions of restricted isometry property and/or mutual coherence to obtain the 
perfect reconstruction in the compressive sensing framework? The authors of this 
paper strongly believe that it’s possible to construct a basis set which essentially 
is sensing matrix A with sufficient a priori knowledge for perfect/almost perfect 
reconstruction in the framework of a linear system of equations for CT image recon- 
struction as described above. 
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Chapter 6 M®) 
Shapley Value and Other Axiomatic creek 
Extensions to Shapley Value 


T. E. S. Raghavan 


6.1 Shapley Value 


Cooperative TU (Transferable Utility) game consists ofaplayerset N = {1, 2,...,n} 
and for each subset S C N (called a coalition), we associate a real number v(S) called 
the worth of the coalition S. 

Any coalition S can be identified with an n-vector where the ith coordinate is 1 
if i € S and is O if i ¢ S. Thus we associate for N = {1, 2,3, 4}, S = {1, 3, 4} the 
4-tuple S$ < (1,0, 1, 1). With each coalition S, v(S) measures what coalition S on 
its own can achieve by full cooperation of its members. Thus v : 2” + R where 
u(p) = 0. 

The following examples are just mild variations of the examples in the given 
references. 


Example 1 ([1]) 


Residents R, D, and U of a house have the responsibility to mow the lawn in front of 
their house. While R has the strength to start the lawn mower and even mow half the 
lawn all by himself, ladies U and D need the help of R to start the engine. With his 
help U can finish mowing the entire lawn or complete mowing unfinished parts. With 
R’s help D can complete 3/4" the lawn. Of course with R’s help in starting the engine, 
U and D can also complete mowing the entire lawn. This game can be represented 
by 2? tuple v(@) = 0, v(R) = 5, v(U) = 0, v(D) = 0, v(UD) = 0, v(RU) = 
1, u(RD) = 3, v(RUD) = 1. 

We would like to evaluate the relative position and strength of each participant in 
this cooperative venture. 
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Example 2 ([4]) 


A landlord who is handicapped owns all the lands in a village with n homogeneous 
landless workers. If he employs k workers and if workers work sincerely he can 
harvest f (k) bags of rice. Suppose f(x) is a concave increasing function with f (0) = 
0, then we would like to evaluate a way to share the total output among the landlord 
and workers. 


Example 3 ([16]) 


An Indian village in Tamil Nadu consists of people belonging to different castes. 
Suppose the following are the voting population size. 


Pillais 300 
Nainars 130 
Kaunders 400 
Chettiars 100 
Brahmins 70 
1000 


While voters by definition have freedom to make one’s own decision on any 
agenda, in reality it often goes along caste lines en block. Thus we can think of 5 
leaders one for each caste playing the game with block votes. Any agenda will pass 
if the agenda has the support from the majority of voters. But in our case we have 
5 players called P, N, K, C, B where B has weight Wg = 70. Similarly Wy = 130, 
Wx = 400, Wc = 100, Wp = 300. We can represent this as a weighted majority 
game with g = 501 and acoalition of castes will win if their total weight > g = 501. 
We represent this as 

[q; We, We, Wk, Wy, Wp] 


where for any Sc {B, C, K, N, P} 


v(S)=1 if \>W2Zq 
ieS 
=0 Otherwise. 
This is called a simple game and is also called a weighted majority game. 


We want to evaluate the relative strength of each caste in their power to pass 
successfully an agenda, seeking cooperation from a few castes. 
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Example 4 (Airport Game) ([{14]) 


The runways in an airport will have to cater to planes of various lengths that require 
safe landing. The capital cost of building a runway will depend on the longest plane 
that will land in the airport. Let C, < C, <--- < C, be the capital costs for con- 
structing runways of planes of increasing length. If m; planes of length J; are to be 
using this airport, then what should be the way to split the capital cost among various 
planes assuming the capital cost as max C;,. 


If only /;, < J;,--- < 1;, are the lengths of planes to use the airport and if 7; planes 
of lengths /;, will be using the facility then, the capital cost to be shared is C;, where 
li, <1j;,+++ <j, and the number of planes using is )>;_, 7,. Again we would like 
some scheme to evaluate the portion of the total capital cost that is to be collected in 
any fair distribution of costs. 


Example 5 (Tree Game) ([{10]) 


There are 7 villages for a TV cable service. Cost of building cables connecting the 
main TV station to villages | and 2 is 12K. Any cable to connect from villages 1, 2 
to village 3 is 1K. Any cable to connect village 4 from villages 1, 2 is 3K, villages 
5 and 6 can be connected by cable from village 4 for a cost of 6K and village 7 can 
be connected to cable service from village 4 for a cost of 5K. The following tree 
describes the situation. 


Q) (4) 
3 
G2) 
|: 
+ TV Station 


If the total cost = (12+ 1+3+5-+ 6) = 27 of building the cable network is to 
be shared among the 7 villages, how should they split the cost among them? 
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Unanimity Games 
Imagine 7 children enjoying a party at my home. My wife prepared a nice cake and 
is willing to leave it with any set of children to distribute among themselves with one 
stipulation. The set must include all of my children. 

If N = {1, 2, 3,4,5, 6, 7} and my children are 1, 2,3 then any subset S among 
them can get the whole cake as long as S 5 {1, 2, 3}. We call this the unanimity 
game where 


v(Sj=1 if SD {1,2, 3} 
=0 if SZ {l,2,3} 


(i) Observe that v treats all of my children symmetrically. 
(ii) Observe that this characteristic function v(S) = v(S Ui) for any i = 4, 5, 6,7 
and for any S' 3 i. We call them dummies. 
(iii) When a group of children get the cake, we want to make sure they cannot waste 
any part but must share the whole cake between them. We call (x1, x2, ..., %7) 
an imputation if x; > v({i}) and Sa x; = 1. (We stipulate efficiency) 


Since my children are treated symmetrically without any order based on their age, etc., 
it is only reasonable that any imputation suggested satisfy x; = x. = x3. Condition 
(ii) says that neither the presence nor the absence of other children have any influence 
on what a coalition gets. We call them dummies and it is only fair they are neither 
punished nor positively rewarded as they are at the mercy of my children in any 
sharing. Shapley stipulates 0 reward for them. That is x; = 0ifi A 1, 2, 3. Since no 
cake is to be wasted, 5° x; = 1 and we get x; + x2 +.x3 = | and x) = x) = x3 and 
that x) = x. = 43 = a 

If we view any TU game (v, N) as a vector in 2!%! dimension then we have the 
following. 


Proposition 6.1 Unanimity games form a basis for the vector space of all TU games. 


Proof We will denote by v7, the unanimity game 


v(S)=1 if SDT 
=0 if SDT, 


we have 2!‘! unanimity games, one for each T C N. Suppose they are linearly 
dependent. Then we have }° 7. aru! =0 for some ar F 0. 

Letar, 4 Oandifa; ¥ 0, then let |7| > |7,|. From the finitely many we can always 
find such a T,. Consider }°7 rv’ (T,) = a7,v" (To) + Virur, er’ (I). Let T # 
Ty. Here T C T, = > ar = 0. Thus the only term that remains on the left side of 


the equation 
aru! =0 
TCN 


is a7,v'°(T,) = ar, # 0 a contradiction. 
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Shapley value ([15]) 
With each unanimity game (v7, N) we can associate the unique imputation ¢(v’ ) 
called the Shapley value given by 
$:(v") = i He € T (Here |T| = 1) 
0 fi ¢ T. 


Further, it trivially satisfies 


1. d;( ov!) = o(v") (Here is any permutation of N) (Symmetry). 
2. Vien Gi(v") = 1 (Efficiency.) 
3. bi(v') = O if i is a dummy player. 


Theorem 6.1 There exists a unique imputation ¢ called the Shapley value for all 
TU games (v, N) with N fixed satisfying 


iv d(u+w) = ¢;(v) + o(w), where (v, N), (w, N), (v + w, N) are TU games 
(Linearity). 


Proof Since unanimity games form a basis of all N person TU games, we can write, 

v= a en aru’ for a unique ay’s. In fact, av? are themselves TU games with 

aru! (N) = ar and linearity determines ¢;(v) = )\ arg; (v’) = >> d(arv’). 
TCN LCN 


Lemma 6.1 [fv = oycy arv’, then 
ar= > (-D‘*(S) 
SCT 
|T|=t,|S|=s 


Proof Let U be any arbitrary coalition. We will check v(U) = Dorey aru! (U) 


Yovt@) YO Eb (sy= Youu) YS Eas) 6.1) 


TCN SCT TCU ScT,TCU 
|T|=1,|S|=s |7|=1,|S|=s 


Now changing the order of summation since v’ (U) = 1 when U 3D T, by changing 
the order of the summation in L H S of step (6.1) we get 
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Yous) So Cb 


ScU SCTCU 
|Sl=s,|T|=t 
t=u 

= u—s _4)f-s 

= Ye) aa 1) 
ScU t=s 

_ u-—s ste te 

=) > us) >> (8) 1) 
ScU t—s=0 

= Ss v(S)(. — 1)“ * (Here if s < u, then terms vanish) 
ScU 


= v(U) as(1 — 1)° = | by convention, when S = U. 


Theorem 6.2 For any arbitrary TU game v, Shapley value ¢ is unique and is given 
by 


~Iln—s)! 
Or See a 5) [v(s) — v(S\i)]. 


Proof Since v = orc: aru! 


gi(v) = D> argi(v") 


TCN 


= > ars 


TCN 
T>i 
|T|=t 


1 
= YE) 0S). =. 
SCT p 
T>i 
Let 1 
hi(S)= YD, 


T>i 
|Sl=s 


where |T| = t. If S > i, then h;(S\i) = —h;(S). Thus $;(v) = Ysew hj (S)[v(S) — 
ieS 
v(S\i)]. We have Cs) coalitions T with t elements such that S C T. 


6 Shapley Value and Other Axiomatic Extensions to Shapley Value 123 
n—s\1 
Thus h;(S) = -1)'* = 
M=DCD( | 
t=s 
= n—s | 
= ev/ ) / x ds 
t—s 0 
t=s 
1 n n—-s 
= / Yi-v'( sas 
0° 4a, t—s 
: : n—s 
= xsl (-1)° x’ S dx 
Leer, 


(s—l)!n—s)! 


n! 


1 
= / x81 — x)" dx = 
0 


Hence the value of ¢;(v) where i € S, |S| = s, is given by 


—1)'\(n—s)! 
w= a *[(5) = v(S\i)I. 


SCN 
S>i 


The formula has certain intuitive appeal. Any player 7 can arrive in a random order 
and if he is the last one to become a member of a coalition S, then his marginal 
contribution to the existing coalition S\i is v(S) — v($\i). The chance he is the last 
one to join S in random formation of the grand coalition is given by et 
Thus the Shapley value is simply the expected marginal contribution by any player 
in case the coalitions form one by one to reach the grand coalition in a random order. 
We will work out the Shapley value for some of our examples above. 
Lawn Mowing Problem 
We are given as 3 players R, U, and D with v(R) = 5, v(RU) = 1, v(RD) = 
3, v(U) = v(D) = v(UD) = 0, (RUD) = 1. 
Coalition formation: Say U D R. Where U is first, then D joins and then R joins. 
By assumption v(U) = v(¢) = 0, vUD) = vU) = 0, v(UDR) = vUD) = 1. 
Thus the players U, D, R collect their respective marginal contribution 0, 0, 1. 
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Thus we can write down the following table of each one’s marginal contribution as 
follows: 


Marginal contribution of 
U D R 
U D R —~ 0 0 1 (Uenters first, then D, then R) 
U R D —-~ 0 0 1 
D U R — 0O 0 1 
D R U —-> j 0 j (D enters first, then R, then U) 
R D U -> ; i 3 
R U D — 35 0 a 
Thus 
eee 4 
U - gets 6 = @(U) = a 
D gets zd sedge S oD) = + 
R gets peu Oe = oU) = a 


Landlord Versus Workers 


Suppose the landlord hires all the n workers in the village who are landless farmers. 
There are (n + 1)! possible arrangements. 

When the landlord is the first to join the ¢ coalition and the rest follow in a random 
order, his marginal contribution is 0. There are n! cases and out of (n + 1) ! possible 
permutations. L’s marginal contribution if L is the ithtoenteris f(@i),i = 1,2...n+ 
1. Thus his expected marginal contribution is =a 6 f @. Thus 


Te Ni 
o(L) = ny FO. 


Since farmers are homogeneous 1[ f(a) - 4 yO f(@)] is the Shapley value for 
each farmer. 


Village Election 


Assume that Brahmins en block lost any interest in voting. Now in this case we have 
only 930 voters in all and simple majority needs 466 votes. Thus the competition is 
only among the 4 castes with first letters P, N, K, C. Let us enumerate all the 24 
possible random arrangement as follows 
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PNKC KNPC 
PNCK KNCP 
PKNC KPNC 
PKCN KPCN 
PCNK KCNP 
PCKN KCPN 
NPKC CPNK 
NPCK CPKN 
NCKP CKPN 
NCPK CKNP 
NKPC CNPK 
NKCP CNKP 


A player is pivotal if the majority forms by the player entering and not before [3]. 
We have underlined pivotal players in this election. We notice that 


K is pivotal in 12 cases 
P ispivotalin 4 cases 
C ispivotalin 4 cases 
N ispivotalin 4 cases. 


Thus the Shapley value is 


ie P Cc N : 
O(K) = 57.6(P) = 60) = 4) = 


Airport Game [14] 


Let N; be landings by planes of length /; and let there be n ; landings by such planes. 
Let N = UN; be the set of all landings. Given any S C N, let j(S) = max {j: 
SON; ¥ } and the cost to cover all landings in S$ is Cjs) and C(f) = 0. 

Let 


m 


Ry = Ui Nj r= yon; 
j=k 


Let vj, U2, ...Um be TU games defined by 


y,(S) =Oif SAR, = 
=C-1-Crpif SAR A¢. 


If k <j(S) then SOR, 4 4. 


Thus If k > j(S) then SO Ry = @. 
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m j(S) 
DY e(S) = D3 Ch-1 — Ce) = Co — Cys) = — Cy 
k=1 


k=1 


Thus v = )oy_, ve and @(v) = Y7 p(vg) (Le., ve( Re) = ve (N)) 
vg is a game with carrier R;. Namely vz( Ry 1S) = vg(N NS) for all S CN 


Pi (UK) = 


| 
° 
= 
is 
” 
a 
a 


| 
i 
ee 
a 
a 
Ra 


Ifi ¢ N; theni € R; ifk < j. Thus the Shapley value is 


j 
C5 C 
div) =) if ie Ny. 
k=1 


6.2 Multi-choice Shapley Value ({12]) 


In many third world countries, the landless agricultural laborers find it difficult to 
form unions to fight out rich landlords. Since they have no other skills to sell, they 
cannot quit the village and often they are stuck in villages to survive with the so-called 
strategy of absenteeism [4]. 

Suppose a landlord offers half of the marginal contribution of the last worker as 
wage for all workers treating them as homogeneous workers. The output is a function 
of their full cooperation. When the landlord hires one sincere worker, they harvest 
20 bags of rice. When landlord hires two sincere workers, they harvest 24 bags 
of rice. This amounts to v(L) = 0, v(W,) = v(W2) = v(W, Wr) = 0, v(LW,) = 
20, v(LW2) = 20, v( LW, W2) = 24. When the worker gets just 2 bags as half of the 
last worker’s marginal contribution, they have all the reason to show greater marginal 
contribution by absenting themselves under various pretexts. 

Assume they pretend to be working sincerely every day but decide to absent 
1/4'" of cultivating days. As an approximation to the expected output they sat- 
isfy v(L) = 0 v(W)) = v(W2) = v(W W2) = 0 v(LW)) = v(LW2) = (20) (3) + 
0 (2) = 15, andto v(LWiW2) = (24) (2) (3) + (20) (3) (4) 2-+0. (4)? = 2522 = 
336 


=¢ = 21. Thus they show higher marginal contribution by absenteeism. 


Motivated by this we introduce, the notion of multi-choice games. 
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Example 6 ([12]) 


Players 1, 2, 3, and 4 jointly own a piece of land. They have 3 possible actions to 
take 


0, : Don’t work 
o; : work halfheartedly 
o2 : work whole heartedly. 


Then let the final output of corn in bushels be given by 


v(x X2 x3 .X4) =O if |{x; : x; = 02}| = O1e., no one works whole heartedly. 
= 1543] {x : x; = o}| if l{x; : x; = oo}| = 1 
= 25 if |{x; : x; = 02}| =2 
= 30 if |{x; : x; = o2}| > 3. 


The main question is what is the relative value for one who works halfheartedly or 
remain lazy or work whole heartedly. 


Even voting games have such action choices as the following example illustrates. 


Example 7 ([{12]) 


In a mathematics department with 50 faculty members 10 professors are distin- 
guished. They have a meeting to vote on promoting a junior colleague. The bylaws 
stipulate that to attend the meeting, one should read the promotion papers in the 
department secretary’s room ahead of time. On the average it takes one hour to go 
through promotion papers. Those who want to strongly support will need another 3h 
to convince colleagues with queries. Marginal supporters spend two hours to follow 
on others’ presentations. Weakly approving colleagues try to attend the meeting by 
just satisfying minimally the bylaws. Any change of mind occurs only before the 
meeting. Final voting is open. 
A candidate can be promoted only under one of the following scenarios: 


1. At least 40 professors show their marginal or strong support and at least 2 distin- 
guished professors are present at the meeting. 

2. One distinguished professor and 25 professors in all strongly support the candi- 
date. 


Here are possible actions 


Xo : Skip the meeting 
x, : Attend but abstain from voting 
X2 : Support marginally the candidate for promotion 
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x3 : Support strongly 


Let z = (21, Z2, .. . 250) represent the action tuple of professors and the first 10 coor- 
dinates represent the actions of distinguished professors. Then let 


v(z) =1 if the candidate is promoted 


=0 _ otherwise 


For example v(z) = 1 if |{zj : zj = x1 or x2 or x3, 1 < j < 10}| > 2 and |{z;: 
Zj =X2 or x3, 1 < j < 50}| => 40 

This is a game with 50 players and 4 actions for each player. How do we measure a 
professor’s influence in the final outcome for various possible actions? 


Definition 6.1 A multi-choice cooperative game in characteristic function form is a 
pair (B”, v) defined by v = B" —> R such that for B = {0, 1, 2.., m}, v(0, 0, .., 0) = 
0. 


Definition 6.2 A multi-choice cooperative game is non-decreasing, ifx > y => 
u(x) = u(y). 


We call the matrix 


o(v) = (¢i(v)) 


i=1,2..m 
j=l2.0 


the power index where ¢;; is the power of influence of player j when he/she adopts 
action i. We assign $,;(v) = OV j € N. We associate weights w,, Ww), ..., Wm with 
actions 0, 1, 2, ..., m such that 0 = wo < wy <... < Wy. 

Axiom I: Let v be of the form 


C>0 if y>x 


uv = 
(y) 0 otherwise 


Then ¢,,,; is proportional to w(x;). 

Axiom I: States that for binary valued (O or C) games that stipulate a minimal 
exertion level from players, the reward for a player is proportional to the weight of 
the player’s minimal exertion level. 


Definition 6.3. A vector x* € B” is called a carrier of v if v(x* Ax) = v(x) Vx € 
B”. We call x° a minimal carrier of v if }° x? = min {}° x; : x acarrier of v}. 


Definition 6.4 We call a player dummy if 
v(x\x; = k) = v(x\x; = 0) Vx € B" & forallke B. 


Axiom 2: If x* is a carrier of v then for m = (m, m....m) 
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Y- bxsi(v) = v(m) 


i 


Axiom 3: ¢(v! + v*) = (uv!) + b(v?) where (uv! + v*)(x) = v(x) + v7 (x). 
Axiom 4:Givenx° € B" ifu(x) = Owhenx 7 x° thenforeachi € N,dxi(v) =0 
for all k < x?. 
This axiom stipulates that people can’t be rewarded when they do not meet even the 
minimal exertion. 


Theorem 6.3 /fw(0) = 0 < w(1)--- < wim) are given, then there exists a unique 
function ¢ : G > Mix» satisfying the axioms I, 2, 3 and 4. 


Definition 6.5 Given B” and wy = 0, wj, ...Wym, let 


m 


[IX llw = >) we). 


r=1 


Theorem 6.4 The multi-choice value function ¢ : G > Mm xn that satisfies axioms 
1, 2, 3, 4and 5 is given by 


we) yi wes) Te 
oe ap 3| FE ee a oe 


k=1 xj=k “TCM; (x) 
x40 
xep" 


(Here b(j) is the unit vector with I at the jth coordinate). 


The complicated formula has the usual interpretation of a suitable expectation with 
respect to what we call a Shapley distribution. For the proof details see [12]. 


6.3 Potential Function and Shapley Value ({11]) 


In an elegant paper, Hart and Mascolel introduced the notion of a potential function 
and showed its intimate connection to the Shapley value. 

Given a TU game (v, NV) and given the set G of all its subgames, let a function 
P:G-— R associate a real number P(N, v) to every game (N, v) and to every 
subgame (N\i, v). The marginal contribution of a player in the game is defined by 


D' P(N, v) = P(N, v) — P(N\i, v). 


Here (N\i, v) is the subgame by restricting the game to only sub-coalitions of N \i. 


Definition 6.6 A function P : G > R with P(@, v) = 0 is called a potential func- 
tion if it satisfies the following condition 
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Y= Di P(N, v) = v(N). (6.2) 
ieN 

for all games v, on the fixed player set NV. 


Thus the allocation of marginal contributions according to potential function adds 
up exactly to the worth of the grand coalition. 


Theorem 6.5 There exists a unique potential function P with the resulting pay off 
vector (D' P(N, v))ien. Consider the Shapley value $;(N, v) of the game. Then 
D' P(N, v) = ¢;(N, v) for every game (N, v) and each player i € N. 


Proof We can rewrite (6.2) as 
PN, ») =~ [v(n) + POV\i.»)| (6.3) 
|N| 


If we start with P(¢, v) = 0, it determines P(N, v) recursively. Thus P exists and 
is uniquely determined by (6.2) and (6.3) applied to (S, v) V S Cc N. Next we can 


express 
v= ) aru! 
TCN 


where v’ are unanimity games. Let dr = in and define 


aT 


6.4 
7] (6.4) 


P(N, v) = So dr = 
T 


Ca is often called Harsanyi’s dividend to the members of the coalition 7). 
Now for this P(N, v) we can check 


Y>[P, v) — P(N\i, v)] 


Hence (6.4) defines the unique potential. However 


AT 
(N,v) =) 
PE Daa 


T>i 


and we are done. 
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We have yet another way to understand the potential function. Suppose from 


{1, 2, ...,2} a random s is chosen. Now you restrict only to coalitions s that are of 
size |S| = s and one of them is chosen at random with probability wy: 


Proposition 6.2 


_ [ia 
P(N, v) = e| iS] (3)| 


for every game (N,v). 


Proof We have to show 


P(N,v) = > CaS) 


Since the marginal contribution of D! P is also the Shapley value the above is the 
potential function. While Shapley value is known to be an expected marginal contri- 


bution, we obtain it as a marginal contribution to an expectation (potential). 


Remark 6.1 Even though we deal just with one game and its subgames instead 
of all games with the number of players fixed, we still get the Shapley value. The 
data we collect is for any one game and its subgames. To fix uniquely the Shapley 
value axiomatically, we do need the axioms to run over every possible game with N 
players. For further extensions and refinements and to multi-choice games see [13]. 
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7.1 Introduction 


The Kaczmarz method is a kind of row iterative method for solving large-scale 
linear systems. The randomized version of the Kaczmarz method was presented by 
Strohmer and Vershynin with the exponential convergence [23]. 
Let A be a real matrix and b be a real vector. The linear system raised by A and 
bis 
Ax = b, with A € R”*” andb e€ R", (7.1) 


and the rows of the matrix A and the vector b are partitioned, 


ay by 
a2 by 

hea | hel, (7.2) 
am bin 
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On each step, the randomized Kaczmarz (RK) updates (7.3) and requires only O(n) 


storage, 
b; — ax, 
Xen. = xX, + ———a, (7.3) 


llai I 
where the index i is randomly selected from 1, 2,...,m. 

Updating (7.3) actually conducts an orthogonal projection from the current solu- 
tion to the hyperplane paralleling the kernel of the selected row. Equation (7.4) is the 


block version of the Randomized Kaczmarz method (BRK) Needell and Tropp [20] 
with the block partitions of the matrix A and the vector b, 


Al by 
A2 bo 
A= » Is b=] . |, 
Ar b, 
and 
epi = Xp + AL(by — Arxe), (7.4) 
where Tt select a set of rows from 1, 2,...,m and Al is the Moore-Penrose inverse 


Wang et al. [26] of A;. 

If the size of t is bigger, then the update is similar to a randomized linear system 
solver with an appropriate selecting approach. 

Since RK is a greedy random optimization method, it can be combined with 
the Nestrov acceleration in real conditions. The accelerated randomized Kaczmarz 
method (ARK) converges faster than the original RK and in some cases even faster 
than the conjugate gradient (CG) according to Liu and Wright [13]. Because the 
randomized Kaczmarz method only uses one sample at a time, it is easy to fall into 
the local optimal solution, so the convergence performance is not good. Using one 
batch for a block version of ARK (BARK) at a time can greatly reduce the number of 
iterations required for convergence and at the same time can make the convergence 
result closer to the effect of gradient descent. The motivation of this work is that we 
obtain a BARK with a certain selecting method of t in (7.4). The convergence of 
BARK is also proved to be faster than the one of BRK. 

The method can also be used on real tensor t-product problems, which is described 
in Sect.7.5. Ma and Molitor [16] gave two t-RK methods in 2021, one with the 
Fourier domain and the other computes matricized version of tensor equations Ma and 
Molitor [16]. We evolve the two methods to t-BARK and illustrate the performance 
of the algorithms on some numerical examples in the end. 
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7.2 Randomized Kaczmarz Method 


The RK method for solving (7.1) with (7.3) is shown as Algorithm 1. 


‘Algorithm 1 (Strohmer and Vershynin [23]) (Randomized Kaczmarz) _ 
Input: A, b, xo and / 

1: fork =0,1,...,1 

2: Select i, from {1,2,...,.m} 


3: Set xn41 = xR + bi= di Xt is 
lla; lb i 

4: end for 

output: x; 


There are two common methods to select i, for each step. By the first one, we 
lai, (I? 

All? 
dj 


of a; = lal before the iteration and select with the same probability. Both of the 


two selecting methods lead to the exponential convergence. 


select it with the probability of and by the second one, we normalize each row 


Theorem 7.1 (Strohmer and Vershynin [23, Theorem 2]) Let x be the solution of 
equation (7.1). Algorithm 1 returns x; in expectation with the average error 


Ble — x13 < [1 — «(A)-?¥* |Ixo — «113, (7.5) 


where the condition number (Cucker et al. [8]) is 
(A) = ||AllrllA'llo- (7.6) 


The following equations show how (7.3) is established. We select a row of A as qj. 
Greedily, we find a y to optimize 


min ||A(xy, +a, y) — dll. 
¥ 


The solution of the above sub-question is y = 2—**, which obtains (7.3). For block 


Iai ll 


version, we replace a; with A,, where |t| = /. The sub-question turns to 


min || A, (xz + A} y) — Dllo. 
yeR! 
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The solution is 
Aly = Al(b, — Arx,). 


Here we obtain the Block Randomized Kaczmarz method (Algorithm 2). 


Algorithm 2 (Needell and Tropp [20]) (Block Randomized Kaczmarz) 
Input: A, b, xo, h and / 
1: fork =0,1,...,1 


2: Select % € R’ from {1,2,...,m} 
3: Set Xe41 = Xz + Al (bz, = Ax, Xk) 
4: end for 
output: x; 


The convergence of BRK is influenced by how the index set T is selected. 
Definition 7.1 (Needell and Tropp [20]) (Row Paving). An h-row paving of matrix 
Aisa partition T = {t), T,..., tT}, where 


a < dmin(ArA‘) and Amax(A, A?) < B for each t € T. 


The number /: is the size of the paving and @ and f are the lower and upper paving 
bounds. Notice that the ratio c is the bound of the condition number x?(A,) for each 
teT. 


Theorem 7.2 (Needell and Tropp [20, Theorem 1.2]) Let x be the solution of linear 
equations (7.1) and x, be the result of Algorithm 2. It holds 


o2, (A) \* 
sx — ml = (1 “iB ) ts dulig: (7.7) 


Every step of Algorithm 2 conducts a basic randomized linear system solver ((NLA) 
with random row sampling Drineas and Mahoney [9]. Algorithm 2 can also be 
regarded as an iterative version of the basic randomized linear system solver, which 
implies its convergence and another usage—to estimate the error of results given by 
other algorithms at a low cost. 
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Algorithm 3 (Liu and Wright[13]) (Accelerated Randomized Kaczmarz) 
Input: A, b, xo, / and y 

1: Normalize A as a; = Tar: 

2: Check that y € [0, Amin]; 

3: Set vp = Xo, y_1 = 0; 

4: fork =0,1,...,1; 


5: Set y, the larger root of ve -*#=-(q- we ae 3 
: m—Vir 
6: Seta, = ytm—yy Bk = — “; 
7: Select i randomly from {1, 2, ..., m)}; 
8: Set ye, Xe41, Ueq1 as: 
Ve = OV, +A = OR) Xx, 
Xk+1 = Ye — 4; (aiye — bi), 
Uewt = Beoe + CL — Bede — yea; (ai ye — bi); 
9: end for. 
output: x; 


7.3. Block Accelerated RK 


The update of x; in step 3 of Algorithm 1 can be written as x44; = x, — OV f (xx), 
where @ is the stepsize and V f(x) is the objective gradient. Nestrov’s accelerated 
procedure Nesterov [21] introduces sequences y,; and vx as 


Ve = OVE + CL — Oty) Xx, 
Xee1 = Ve —~ HAVA’); 
Vest = Baur + 1 — Bede — WV FO): 


The procedure gives a better convergence with appropriate a;, 6, and y,. Scalars ax, 
B, and y;, in Algorithm 3 guarantee a better convergence than Algorithm | with an 
appropriate choice of y Liu and Wright [13]. 

In step 8, we can calculate a,’ (a;y, — b;) only once to make the update less 
expensive. The total cost of step 8 is 9n. If parallel computation is possible, the total 
cost can be reduced to 4n by calculating 
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Sk = Ai ye — bj, 

8k = SKQij, 

By = (LR eg + Ve) SK, 

Vegy = Cl = my ote xe + = Oe + ma MY) VE 
Yet = Ver — ho 

Xk+1 = Vk — 8k- 


We obtain the block version of ARK as follows by combining Algorithms 2 and 
2. The following algorithm (orthogonalization with the Gram-Schmidt process) is 
called before the iterations of BARK (Block Accelerated Random Kaczmarz). 


Algorithm 4 (Orthogonalization) 

Input: A, b and row paving T 

1: fort € T 

2: fori in each t 

3: a; =a; — D0 (aj, aj)a;, b, = bi — YO (aj, aj)b; 
js isi 

4: qQ= Tal? b; = a 

5: end for 

6: end for 


output: A, b 


Now we obtain BARK as follows. 


Algorithm 5 (Block Accelerated Randomized Kaczmarz) 
Input: A, b, xo, /, T and y 

1: Orthogonalize A with row paving T by Algorithm 4. 

2: Check that A € [0, Amin]; 

3: Set vo = Xo, v-1 = 0; 
4: 
5: 


fork =0,1,...,1; 
Set y;, the jaiwer root of Ve ae eee (eee Hoye D3 
where h is me size of each tT € "T 
Set a = ae ayy: Be = 1 
Select t from T; 
8: Set ye, Xe41, Vet as: 


Ric: 


Ve = Op Ve + CL — Ag) Xg, 
Xe+1 = ye — Al (by — Arxx), 
Vert = Bere + (L — Beye — Ye At (be — Axe); 


9: end for. 
output: x; 
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7.4 Convergence and Error 


Let us discuss the convergence of Algorithm 4. To give the convergence analysis, we 
make the following definition. 


Definition 7.2 (Shi et al.[22], Zouzias and Freris [31]) Let A € R””*” and x € R”. 
The energy norm || - ||(47 4) is defined as 


lx Marat = (x, (ATA) x) = x" (AT ADTs. (7.8) 


It is trivial that (7.8) yields an energy norm of a vector x. 


For BARK, we have the following result. The matrix A in our analysis is already 
been orthogonalized with the row paving T according to Algorithm 4. 


Theorem 7.3 Let x be the solution of (7.2) and xx, vx be the result of Algorithm 4 
with the sequence x and row paving (Definition 7.1) satisfying X € [0, A ]. Define 


0) = 14+ 4 and oy = 1— ¥, then 


4||xo — x|[? TA)t 
; 2 ) (ATA) 
Up — x 2 a ae (7.9) 
(le = leary es 
and 5 
AN\lxo — XIlcat ay 
7 2 (At A)! 
£ (|x, — x||5) < ———-— 710 
(|| k II3) = (ok = of)? ( ) 
We start out proof with two useful lemmas. 
Lemma 7.1 For any y € R", we have 
ee I 
5, (ARG: — Ary)Ifqrayr) SZ IAY — BIB. (7.11) 
Proof Since t is chosen at random from T, we have 
Ee||AT (Ary — br) Iaray' 
h 
1 Soh 4 
= a (AT AY AL(Ary ~ be), At(Ary ~ by). (7.12) 
1 


Since A is orthogonalized with row paving T, we have At = A!, (7.12) turns to 
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ir TAT (Ary — be )Ilcatayt 


1 h 


=~) ((ATA)TAT (Ary — br), At (Ary — by) 
1 


hh 


h 
1! Tayi T T 
= Ftrace (\ A) SS AT (Ary — by )(Ary — br)" Ar 


1 


1 . 1 : 
= trace ((A'A)'A'XA) = jtrace (A'XA), 
where X = diag(A,y — b;)(Ary — b,)! for all t in T. 

We utilize the singular value decomposition (SVD) of A as UX V' and(A'A)i = 
Vx~?V', where U and V are orthogonal matrices, respectively. Thus 


I IIAL (Ary — by) Ilcat ay 


1 = 
= trace (V= 'U'XUZV') 


1 + 1 
Pi (U XU) = Ze) 


IA 


1 1 
5 DW Ary = belld = ZIAy — BI. 
Z 


Lemma 7.2 For any solution x to (7.2) and any y € R", we have 


: t 2 2 | 2 
or (Ily + At (bc — Ary) — x1I3) = lly — x11 — 7 Ay — dlls, (7.13) 


where t is selected randomly from rows. 


Proof We have 


or (Ily + Al (b, — Ary) — x11) 

= lly — xl} + E,(JAi Gr — Ary)II3) + 2Er (y — x, AT, — Ary) 
= lly — xl} +E, (|ATAr( — y)II3) + 2Br (y — x, ATA (x — y)) 
= lly — x15 — E,(WATAL(& — y)I3) 


1 
= Ily — 313 — 5 DAT (be — Ary) 
T 


1 
= ly = 213 — FllAy ~ bila. 


We finish the proof. 
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Now we prove Theorem 7.3. 


Proof Define 


Y 
ty) i=? h (1 AYg_1) Veet: (7.14) 
For 2 < iui < is y_, = 0, we have 
1 > Ah 
t(0) = -ye_ 1 <0, ¢ ne i Vel m2 1) <0. (7.15) 
If y-1 < we then we have t (y,_1) = — tit (1 _ Ae 4) < 0, and 
1 1 h 
t = Pan 4) v2. 
(=) Le filh ( VYe-1) Ve-1 
1 h Ze oe 
=e a =I (7.16) 


__ Vo Jao 


Equations (7.15) and (7.16) indicate that y%, € [v- 1: |: From how a, f and y 
yield in out algorithm, there are 


l-a, __ hey—h h hyz—ve _ hy2_, 


= haya ve heh Me? 
Ve = = Bey 4 = 0. 
Let r be the error of the algorithm 
rp = |lu, — XIicaray! : (7.17) 
We have 
a 2 
ist = let —2llcyrayt 
. 2 
= Bere + 0 — Ba) 1k — eAT Arve 20) — 2] cr gt 


=UPkrK + 1 Br) 9 AIP gg tHe [AL Ar —bO| arg 


— 2% (ri + (1 — Be) YR — x, (ata)! Al (Ary — b»)) 


2 
= Wire + — Bad) ye Nar gyt FYE [AE Are FO] ar gt 
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1 1 
— 27K | Bx rae 


= [Bere + 0 ~ Be) de Ir ays + 18 [AE Aron — be) 


= t 
“th ) + (I= Be) ye. (ATA) AT Ary - b)) 


2 
(ata)! 


1- Peas 
+ 2y4 (: vk + Bi Ore = 94) (ATA) AE Arve = b»)) . (7.18) 
The first item in (7.18) can be bouded as follows. 


Bev + 1 — Br) ye — *W ray! 


< Be llde — xl gr gyt + = Be) Ihde = 2M gray! 


7.19) 
2 Ver 2 ( 
= Bx Il vx = XM caray' + hh llve = XM Caray’ 
Vk 
< Be lle — Ml gray + F lle = #2 
For the second item, recalling Lemmas 7.1 and 7.2, we have 
7 + 2 
ir (k)| (0), t(1)..st k=) (ey (Ary — by)| (arayt) 
(7.20) 


1 
2 _— 2 
=; | Aye — BIS = Myx — x13 — Exaoje@,ra),..ur—1) (Iee+1 — £115) - 


For the third one, we have 


1 


— ak TA\t (at 
Bele Or Daonrh-b (4 Ye + a Bx (Xk vg) (A A) (Ai eve ~ 09) 


1 1 — 0% iat 
=i: yk + Bk Ok ve), (ATA) 2 At (Are ~ bs) 
1 l—az T t T 
= (x — ye + —“* Ge — ne), (ATA) ATA GR -2) 
Ok 
1 1—ag 
S7\e ee Bx (Xk — Yk) + Yk — 
otk 
=o oleae Bx (Xk — Yk» Yk — *) 
Ok 
= = (— iy — x13 + Be (ne — x13 — lye — 13 — lhse — ye 3) 
h 2 orp 2 2 2 
1 1— az 2, l-a 2 2 
= Go ( + aye) live — 2B + So Be (Ieee — 03 — lax — 3) 
2 2 
1 PeYj1 2, PeYe1 2 2 
=-(7+ lye — 113 + (Ix — x13 — lax — yell3) 
(; 2YK a 2M 
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2 2 
1 Beye 2 BRY~_| 2 
< , F 721 
s (; + Oi lye — xllo + on xe — x1l3 (7.21) 


Combining (7.19), (7.20) and (7.21), we obtain 


< Bi |e — 2" gray + = be —3* 


+ y; (vx - x* | — Ex()|2@),70),...0kK-D (xe - x*)) 


2 
— (72 + burs) JoeB + Boks be B 


< Bx |\vx — x*|| (atayt 7 VeEr()|1(0),1(1),...7e—1) (xe - x*[>) + Bevg_y ||xe — Pall 


+ (v2 — * = Beye) I|yx —x*|5 


= Bx ||vx — x* | (atay' ~ VeEx()|t(0),1(1),...7—1) (za — x*[5) + Bevg_1 ||xe — ale 


(7.22) 
Then we define sequences {A;} and {B;} as follows: 
2 Br 2 2 p2 
Ay = 0, By = 0, Bo #0, Be = By? tit = Yi Bit: (7.23) 


Let Ao = 0. Since By € (0, 1], Bey; > By. From the definition of 6 and y, we have 


Beye 7 ye Ak _ Ave 
Bx BY = Ye Ve/ he 


Ai = (7.24) 


Now we know that {A;} is increasing. Multiplying (7.22) by B; ',, on each side, using 
Algorithm 5 and (7.24), we obtain 


27 2 a qe we 2 
By Er @|c0),10),...t(k-D (rit) =P Ags 4t (k)|t(0),7(1),...,7(K-1) (xs = i) 


< Ber? + A? | xz tle. 


and 


; 2 2 2 * [2 
r(O)r(1)...0(k) (Brake + Ags (ea = I:)) 
> 2 2 
= 17 ()r (1)... 1(kK-D) (Be Sr (k)\t(O),t(1),....0(kK—-1) (ri) 
2 7 2 
FAG Ee) [20),20),...2 =D (ea -x*l))) 


S Ee@ra)...1k-1) (Ber + Aj te — x* i) j 


Recursively, we obtain 
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2 12 2 «||2 
4t(0)t(1)...0(k) (BE re. + Agi (xe x I:)) 


<E.@ (Bir7 + Aj |x — x*[2) 


a 


Borg + Ao || xo —x *|3 = = Bgr}. 


We drop the last term since Ag = 0. We achieve the bounds, 


Bo Bo 
(bu) Sgr and B([uni -2"I3) se 
c+ 


Now we try to estimate the growth of A, and ‘iad Here we follow Nestrov’s proof [21] 
and we have B? = Br afi = (1 — yy) Be kL = =(1 _ a ) Bag which implies 


that : 
7 Atti Best = B?,, — B2 = (By + Bett) (Busi — By) - 


Reminding that By., > By, we have 


Xr 
B.i,>B — Ay}. 
k+1 2 Bet ay etl 


Using the definition of A, B, 6 and y, we obtain 


Ait Ags = Vk 
Bey Beas "oh 


Recalling Axi; > Ax, we get 
1 
— Act Bev = Aja — Ag = (Acti + Ag) (Acti — Ag) < 2Ant1 (Acs 


Thus, 


>A Bore 
v3 1+ 5. 


By combining (7.25) and (7.26), we obtain the following estimation: 


k+l 
Ani} | 1 a [Ao 
Bua | 7 * 1 Bo 


NI 


(7.25) 


Ap 


(7.26) 


(727) 
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The Jordan canonical form of the matrix in (7.27) becomes 


[Rr]-La-a] [all 


with o,; = 1+ Yi gy =1- Vk Thus, we have 


1 
Vv -, 


t. 
Vi V2. 


o, O 
0 o 


i; 


1 i: _111 % |[oi o 1 1 
ai 2 ill 0 ost || VA VA 
| ge 1 (of Lt a E) [Va] 
= 3 oh oe) a on © oft! 
We get 
Agcy = Bo (of*! — 031) (QV), (7.28) 
and 
Buy = (oft! + of*") Bo/2. (7.29) 


Applying (7.28) and (7.29) to (7.22), we obtain 


Allxo—x* lI? ; 
(2 \_ «1/2 Bi 2 (474) 
(ri) = (is 9 en, SRO SoD 
4A ||xo—x* |? 
; «12 Bo .2 (474 o 
E (lle41 —x II3) s zz ,,10 s (of aft ; 


and we complete the proof. 


One difficulty of computing with Algorithm 4 is how to initialize A. We can set A = 0 
to guarantee the convergence, but it converges slowly. The best A we are looking for is 
its upper bound 2, Theorem 7.3 indicates that we can perform several steps of BRK 
before BARK to approximate the best A. From Theorem 3, we give an estimation of 


xX as 
eT 
hah 1-( ) . 


where k, and kz are the iterations of the original BRK method. However, in our 
experiment, the result given by the above equation is often larger than what we want. 
We also give two conservative estimations. 


|| Axx, — B13 
|| Axx, — BIl3 


gol. 
k= (i= (ie = a 
Ax, — 513 


and 


146 A. Qiao et al. 


1 
| Axx, — bll3 


ny 


The second equation gives us a A less than 
average of the two estimations. 


zB . The third one is the previous square 


7.5 Applications to Tensor Equations 


Third tensor is a three-dimensional matrix, which can be regarded as a series of 
matrices of the same size. When we process tensors, such as images and videos, 
tensor decompositions or computational methods Che and Wei [3], Wei and Ding 
[30] are quite different from matrix methods due to the extra dimensions. 

The tensor t-product and tensor operations under the t-product Chang and Wei [1, 
2], Che and Wei [4], Che et al. [5], Chen et al. [6], Chen and Qin [7], Lu et al. [14], 
Lund [15], Miao et al. [18, 19], Tarzanagh and Michailidis [25], Wang et al. [27-29] 
have been presented, which build a connection between the tensor decomposition and 
the matrix case. These tools inspire us to tackle the tensor by the matrix approach. 


Definition 7.3 (Kilmer and Martin [12], Martin et al. [17]) Let o& € IR”*"** and 
Be R™** Then t-product is defined as 


C6 =A «B= fold(beirc(e) « unfold(F)), 


where 
By Bak Bik] aa A) Bu 
Bn By Ak tee A 
beirc(a#) = . : > . : unfold(Y) = 
Bf Bj | Bk 2 pres By BK 


and fold is the inverse operation of unfold. 


We now use the BARK to solve the tensor equations Du and Sun [10], Ma and Molitor 
[16], Tang et al. [24], 


Peevah, Cer™, pepe. (7.30) 


7.5.1 Tensor BARK 


To solve the tensor equation (7.30), we only need to deal with the matrix equation 


beirc(.7) * unfold(x) = unfold(b). (7.31) 
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Then we have the following algorithm. 
Since bcirc(./) is circulant, we only need to orthogonalize an m x nk matrix in 
step 1, and in each iteration, it takes O(nk). 


7.5.2. Tensor BARK in the Fourier Domain 


The t-product can be computed by means of the discrete Fourier transform (DFT). 
Firstly, a block circulant matrix Jin et al. [11], Kilmer and Martin [12], Martin et al. 
[17] can be block-diagonalized by the DFT, i.e., (F; ® Im)bcirc(W)(Ff ® In) is a 
block diagonal matrix. 


Algorithm 6 (t-BARK) 

Input: <, b, xo, /, T and y 

1: Orthogonalize beirc(.2).»,; with row paving T by Algorithm 4. 
2: Check that A € [0, Amin]; 

3: Set vp = phil as y_1 = 0; 


4: for t = 0, 1, 
5: Set y, the ae root of ue -#%=-(1- ee aye D3 
where hi is the size of = TE MT 
. hk-yr Vr. 
6: Seta, = ACS) 7 Br = =l|- 
7: nie 1 ar ao {0, 1,...,k — 1}; 


8: Set yy, Xp41, Uri as! 


Ye = a; + 1 — a) x, 
X41 = Vt Al ,(z,1,5 = ArsXt)s 
Ur41 = Br Us + d = Bu) yt _ VA! (bz,1,8 — AzsXt), 


where A;.5 = [Ar;(¢,ns +1: nk), AcG, 1: ns)]; 
9: end for. 
output: x; 


Definition 7.4 Let o/ € R”*"**. Then 
bdiag(.Y) := (Fy ® In) bcirce(./) (Fe ® In) = 


; (7.32) 


where F; is ak x k the DFT matrix, i.e., 


f= (a)s fp SO ee Se dS. 
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Then (7.30) can be written as matrix equations 
bdiag(.e) *« bdiag(x) = bdiag(b). (7.33) 


Here, bdiag(.2) is a block diagonal matrix, we can solve A; Xj = bi, respectively. 


Algoritm 7 (t-BARK in the Fourier domain) 
Input: ©, b, xo, /, T and y 

1. A = fft(A, [], 3), b = fft(d, [], 3); 
2.fori=1,2,...,k 

3. solve A; * x; = b; with Algorithm 5; 

4. end for 

5. x =ifft(x, [], 3); 

output: x 


7.6 Numerical Examples 


In this section, we perform numerical examples over BARK, BRK and ARK. In the 
test examples, the algorithms terminate when reach the max iterations we set. 


Comparison between BRK and BARK. 


First, we concentrate on the nonsingular equations. We generate the test matrix by 
initializing its SVD. The test matrices have different scales, condition numbers and 
ranks. 


Example 1 
The testing matrices are generated by SVD (A = UXV') with random U, ¥ and V. 


From Fig.7.1, we conclude that BARK(opt) performs much better than BRK. 
BARK (auto) can reach a similar convergence rate as 4 is calculated. In some cases, 
BARK(0) is also competitive compared with BRK. Notice that whenA < Age , the line 
shows a periodic feature, which is distinct in Fig. 7.1c. We explain this by Algorithm 
3 that gives a periodic list of a, and £;, which leads to the periodic performance of 
convergence. We also note that when the matrix is extremely nice, which means it 
has a low condition number, BARK is better than the evolved methods. 

RK method can also be used to solve the singular equations. 


Example 2 


We generate the singular testing matrices as in Example 1. 


The performance is similar to those on the nonsingular equations (Fig. 7.2). 
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Fig. 7.1 Performance of considered methods for the nonsingular equations. Here m is the size of 
the matrix, / is the block number and « is defined in (7.6) 
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Fig. 7.2. Performance of considered methods for singular equations. r is the rank of the matrix 
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Fig. 7.3. Comparison results between ARK and BARK 


7.6.1 Comparison Between ARK and BARK 


We compare the two methods by conducting BARK with different block heights 
(written as b) for the matrices in Example 1. 

To balance iterations and complexity, the scale on x-axis is inverse ratio according 
to b. Thus, points on the vertical line have the same computational complexity. In both 
cases, all methods with different b have almost the same performance. The greater b 
is, the more expensive the algorithm cost and the less step it takes. The difference in 
complexity is mainly generated at the step of orthogonalization (Algorithm 4) before 
iteration. With parallel calculation, BARK will show its advantage (Fig. 7.3). 


7.6.2 T-BARK Performance 


The following figure shows that BARK performs on the tensor equations as well as 
it does on matrix case. The A estimation also works well in the tensor situation. 


Example 3 


We consider the testing tensor with each slice of the tensor generated by the same 
method as in Example 1. 


Then we compare t-BARK with and without the FFT (Fig. 7.4). 

In these three figures, the x-axis of t-BARK(fft) is its iteration number times the 
third size of tensor in order to make the complexity the same as the other algorithm. 
As illustrated above, both convergence rate of the algorithms mainly depends on its 
A. The two methods have different A for the same question. In Fig. 7.5a, t- BARK(fft) 
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Fig. 7.5 Comparison of t-BARK with and without the FFT. The testing tensors are uniform random 
tensors 


converges faster than t-BARK, and in Fig. 7.5c, itis the contrary case. The experiment 
shows that the method with a larger A converges faster. However, t-BARK with the 
FFT can be computed parallelly and on well-conditioned slices, it converges quickly, 
which means sometimes it has lower complexity to reach a certain error compared 
with direct t-BARK. 
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Chapter 8 ®) 
Nullity of Graphs—A Survey and Some cree 
New Results 


S. Arumugam, K. Arathi Bhat, Ivan Gutman, Manjunatha Prasad Karantha, 
and Raksha Poojary 
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8.1 Basic Results on Nullity 


Let G = (V, E) be a graph of order n and let V(G) = {vj, v2, ..., v,}. Since the 
adjacency matrix A(G) = (a;;) is a symmetric binary matrix, its eigenvalues are 
real numbers. The set of all eigenvalues of G is the spectrum of G and is denoted 
by Sp(G). The nullity 1(G) is the algebraic multiplicity of the number zero in the 
spectrum of G. 
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In chemistry, in the context of the Htickel molecular orbital theory Coulson et al. 
[6], the nullity of molecular graphs of so-called “alternant conjugated hydrocarbons” 
(which necessarily are bipartite graphs) plays an outstanding role. Namely, 7 is an 
indicator of the stability of the respective chemical compound: n = 0 is a necessary 
(but not sufficient) condition for chemical stability. If 7 > 0, then the respective 
chemical species is extremely reactive and usually does not exist. For further details 
on this, we refer to Cvetkovi¢ and Gutman [8], Graovac et al. [13], Longuet-Higgins 
[23], Yates [34]. 

If v is an isolated vertex of G, then the row corresponding to v in A(G) is the zero 
vector and hence 7(G) > 0. Hence, we assume that G is a graph without isolated 
vertices. There are some classes of graphs for which the spectrum is known and 
hence the nullity is determined. 


Theorem 8.1 (Cvetkovi¢ and Gutman [7]) The spectrum of the complete graph Ky 
consists of n — 1 and —1 with multiplicity n — 1. 


Corollary 8.1 7(K,) =0foralln > 2. 


Theorem 8.2 (Cvetkovic and Gutman [7]) The spectrum of the P, is given by 


Sp(P,) = 42 ua ieee 
= cos { ——-}] :l<r< ; 
DP raed r n 


1 ifn is odd 


Corollary 8.2 (Cvetkovi¢ and Gutman [7]) n(P,) = Cahn weven 


Theorem 8.3 (Cvetkovié and Gutman [7]) The spectrum of the cycle C,, is given by 
2nr 
Sp(C,) = 42cos{| —] :O0<r<n-I1}. 
n 


2 ifn =0 (mod 4) 


Corollary 8.3 (Cvetkovié and Gutman [7]) 7(C,,) = Gameneie 


Let ®(G, 4) = |A —AI| =A" + aya"! +--+ + a, be the characteristic polynomial 
of G. Sachs’ theorem gives a formula for the coefficients a; from the structure of the 
graph. A subgraph H of G is called an elementary subgraph if every component of 
His either Kz or a cycle. Sachs called such subgraphs basic figures. 


Theorem 8.4 (Sachs [28]) Let ®(G, 4) = a ajh"—' be the characteristic poly- 
nomial of G. Then 
a= ) 12” (8.1) 


ses; 


where S; is the set of all elementary subgraphs s of order i, c(s) is the number of 
components in s and r(s) is the number of cycles in s. 
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If G is a bipartite graph, then every cycle in G is an even cycle and hence there is 
no elementary subgraph of odd order. Hence, a; = 0 if i is odd, or equivalently the 
characteristic polynomial of G contains only even powers of A. Hence, we have the 
following theorem, which is called the pairing theorem. 


Theorem 8.5 (Cvetkovi¢ and Gutman [7]) Let G be a bipartite graph. Then r € 
Sp(G) = > —A€ Sp(G). 
Since the trace of the adjacency matrix is zero, we have )~"_, A; = 0. Hence, Theorem 


8.5 gives the following corollary. 


Corollary 8.4 (Cvetkovié and Gutman [7]) Let G be a bipartite graph of odd order. 
Then n(G) > 0. 


Theorem 8.6 (Cvetkovié and Gutman [7]) Let T be a tree onn > | vertices and let 
m be the size of its maximum matching. Then n(T) =n — 2m. 


Proof Since any elementary spanning subgraph of T is a matching in 7, it follows 
that there is no elementary spanning subgraph of order k for any k > 2m. Hence, the 
result follows. 


Theorem 8.6 has been generalized as given below. 


Theorem 8.7 (Cvetkovi¢ and Gutman [7]) [fa bipartite graph G withn > | vertices 
does not contain any cycle of length 4s (s = 1,2, ...), then n(G) = n — 2m, where 
m is the size of its maximum matching. 


The vertices of a bipartite graph may be numbered so that the adjacency matrix has 


the following form: 
0 B 
s=[sr 6] 


The matrix B is the incidence matrix between the two sets X and Y of vertices of 
the bipartite graph. 


Theorem 8.8 (Gong and Xu [7]) For the bipartite graph G with n vertices and 
incidence matrix B, n(G) = n — 2r(B), where r(B) is the rank of B. 


The following theorem gives how nullity changes on the removal of a vertex. 


Theorem 8.9 (Gong and Xu [12]) Let v be any vertex of a graph G with an order 
of at least 2. Then 
n(G)-—1<n(G—v) <(G) +1. 


It follows from Theorem 8.9 that 7(G — v) = n(G) — 1 orn(G) orn(G) + 1. Exam- 
ples of all three types are given in Fig. 8.1. 

The vertex set V of the graph G can be partitioned into three subsets V~, V°, 
V+ depending on how the nullity changes. Let V~ = {v : n(G — v) = n(G) — 1}, 
V° = {v:n(G —v) = n(G)} and Vt = {u: n(G—v) = n(G) + I}. 

We observe that for the complete graph K,, V = V°, for the even path P,, V = 
Vt, for cycle C,, wheren = 0 (mod 4), V=V-. 
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G H Pa 


Fig. 8.1 The graphs G, H, P4 with n(G — v) = n(G), n(A — v) = n(A) — | and n(P4 — v) = 
n(P4) + 1 


Problem 8.1 Characterize graphs for which 


1V=V- 
2.V=v?9 
3. V=Vt. 


In Gong and Xu [12], the authors considered the effect of the removal of cut vertices 
on the nullity and proved the following results. 


Theorem 8.10 (Gong and Xu [12]) Let v be a cut vertex of a graph G of order n and 
G,, Go,..., 

G, be the set of all components of G — v. If there exists a component, say G, 
among G1, G2,..., Gs such that n(G,) = n(Gi + v) + 1, then 


n(G) =n(G—v)-1= )) (Gi) - 1. 


i=1 


Theorem 8.11 (Gong and Xu [12]) Let v be a cut vertex of a graph G of order n 
and G, be a component of G — v. If n(Gi) = n(Gi + v) — I, then 


m(G) = n(Gi) + n(G — Gi). 


The authors Mohiaddin and Sharaf [25] considered the effect of identifying two 
vertices on the nullity of the graph. Two vertices u and v are said to be identified 
if they are replaced by a vertex w such that N(w) = N(u) U N(v), where N(v) 
denotes the open neighborhood of the vertex v. The authors have proved that in 
general, identifying any pair of distinct vertices cannot increase or decrease the 
nullity by more than 2. 


8.2 Bounds for the Nullity of a Graph 


In this section, we present bounds for the nullity of G in terms of other graph theoretic 
parameters such as clique number, girth and diameter. The following observation 
gives a tool for obtaining such bounds. 
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Observation 8.1 Let G = (V, E) be a graph of order n. Let H be an induced sub- 
graph of G. Clearly r(H) < r(G), where r(G) is the rank of the adjacency matrix 
of G. Since 7(G) + r(G) = n and n(H) + r(A) = |V(A)|, it follows that 


n(G) <n+n(H) —|V(A)|. (8.2) 
Hence if G contains an induced subgraph H whose nullity is known, then it can be 


used to get an upper bound for 7(G). 


A clique in G is acomplete subgraph of G. The clique number w(G) is the maximum 
order of a clique in G. Hence if K, is a subgraph of G, then p < w(G). 


Theorem 8.12 (Cheng and Liu [4]) Let G be a graph of order n and let K, be a 
subgraph of G with2 < p <n. Thenn(G) <n-— p. 


Proof The result follows from Corollary 8.1 and Observation 8.1. 


Corollary 8.5 (Cheng and Liu [4]) Let G be a graph of order n. Then n(G) < 
n—a@(G). 


The girth of G, denoted by gir (G), is the length of a shortest cycle in G. The diameter 
of G, denoted by diam(G), is the maximum distance between any two vertices. 
Hence, a cycle of length gir(G) and a diametral path are induced subgraphs of G. 
Also, the nullity of a path and cycle are known (Corollary 8.2 and Corollary 8.3). 
Hence, using (8.2), we get the following bounds for n(G). 


Theorem 8.13 (Cheng and Liu [4]) Let G be a graph of order n and let the cycle 
C, be an induced subgraph of G, where 3 < p <n. Then 


n—p+2 ifp=0 (mod 4) 
n—p__ otherwise. 


n(G) < 


Corollary 8.6 (Cheng and Liu [4]) Let G be a graph of order n having at least one 
cycle. Then 


n— gir(G)+2 ifgir(G)=0 (mod 4) 
n(G) = n—gir(G) otherwise. 


Theorem 8.14 (Cheng and Liu [4]) Let G be a graph of order n and let the path P, 
be an induced subgraph of G. Then 


n—p+l if pis odd 
m(G) < n—p__ otherwise. 
Corollary 8.7 (Cheng and Liu [4]) Let G be a connected graph of order n. Then 


(G) < n —diam(G) + | if diam(G) is odd 
a _ n—diam(G) _ otherwise. 
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The authors Mustapha and Hansen [26] have given a necessary condition for graphs 
to satisfy n(G) = n — diam(G). 

The bounds given in Corollaries 8.6 and 8.7 can be improved. Let Py = 
(v1, V2,..., Ug) be a path in G. An edge v;v; where |i — j| > 2 is called a chord 
of the path. A path having no chords is called a monophonic path. Thus, a mono- 
phonic path is simply a chordless path or equivalently an induced path in a graph. 
The monophonic distance d,,(u, v) is the length of a longest u — v monophonic 
path in G Santhakumaran and Titus [29]. The monophonic diameter mdiam(G) 
is defined by mdiam(G) = max{d,,(u, v) : u,v € V}. The maximum length of 
an induced cycle in G is called the upper girth of G and is denoted by ugir(G). 
Clearly, mdiam(G) > diam(G) andugir(G) > gir(G) and the difference between 
these parameters may be arbitrarily large as shown in the following example. Let 
G be the graph obtained from the cycle C, = (v1, v2,..., U,, v1) by adding the 
edge v;v3 where n > 5. Then gir(G) = 3, ugir(G) =n —1, diam(G) = | 5 | and 
mdiam(G) =n — 2. 

Observation 8.1 leads to the following new bounds for 7(G), which is an improve- 
ment of the bounds given in Corollaries 8.6 and 8.7. 


Theorem 8.15 Let G be a connected graph of order n, having at least one cycle. 
Then 
n—ugir(G)+2 ifugir(G) =0 (mod 4) 

n—ugir(G) — otherwise. 


m(G) < 


Theorem 8.16 Let G be a connected graph of order n. Then 


n—mdiam(G)+2 ifmdiam(G) is odd 
m(G) = n—mdiam(G) _ otherwise. 


There are several bounds for nullity in terms of the maximum degree A. 


Theorem 8.17 (Fiorini [11]) For a tree T with order n and maximum degree A, 


n(T) <n-2 [+1], and the upper bound can be attained. 


A result analogous to Theorem 8.17 has been proved for bipartite graphs. 


Theorem 8.18 (Omidi [27]) Let G be a bipartite graph with n vertices, m edges 
and maximum vertex degree A. If G does not have as subgraph any cycle whose size 
is divisible by 4, then n(G) <n —2 |]. 


However, the problem of establishing an upper bound in terms of order and maximum 
degree for an arbitrary graph was left open for about ten years. Recently, this problem 
was solved by Zhou et al. [35]. 


Theorem 8.19 (Zhou etal. [35]) Let G be a graph with order n and maximum degree 
A > 1. If G has no isolated vertex, then n(G) < (4) n, the equality holds if and 
only if G is the disjoint union of some copies of Ky.n. 


The authors Zhou et al. [35] posed a conjecture for further study. 
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Conjecture 8.1 If G is assumed to be connected, then the upper bound of n(G) can 
be improved to Qadri? and the upper bound is attained if and only if G is a cycle 
C, with n divisible by 4 or a complete bipartite graph with equal size of chromatic 


sets. 


The authors Wang and Geng [32] have proved the above conjecture. 
Characterization of graphs which attain the various bounds given in this section 
is an open problem. 


8.3. Energy and Nullity 


In the 1930s, Hiickel put forward a method for finding approximate solutions of a 
class of organic molecules, namely, unsaturated conjugated hydrocarbons. Within 
this theory, the concept of total electron energy (called “total 2-electron energy”) 
was elaborated Coulson et al. [6], Graovac et al. [13], Yates [34], which plays an 
important role in chemistry. This motivated Gutman to introduce the concept of 
energy of a graph Gutman [14], which in the general case differs from Hiickel’s total 
zc-electron energy, but coincides with it for majority of chemically relevant graphs. 
From a mathematical point of view, Gutman’s graph energy has a much simpler form 
than Hiickel’s total energy, and it can be straightforwardly applied to all graphs. 

As a consequence, the concept of energy of a graph was put forward in Gutman 
[14]. 


Definition 8.1 Let S,(G) = {A1, A2,..., An} be the spectrum of a graph G of order 
n. Then &(G) = pas |A;| is called the energy of G. 


Thus, the energy of a graph G is the sum of the absolute values of the eigenvalues of 
G. A long-standing open problem in the theory of graph energy is to determine the 
relation between the energy &(G) and nullity 7(G). From a quantum chemical point 
of view, the greater is 7(G), the weaker should be the bonding in the underlying 
molecule, and consequently, the smaller should &(G) be. Thus, the following has 
been suggested by arguments from quantum chemistry. 


Conjecture 8.2. &(G) is a decreasing function of n(G). 


Gutman and Triantafillou [18] suggested one formulation of the above conjectured 
property. If n(G,) > n(G2), then &(G;) < &(G2), where it must be assumed that 
all other structural features of the graphs G; and G2 influencing the values of their 
energies are same or differ negligibly. The energy &(G) depends mainly on the 
number of vertices, number of edges and the number, type and mutual arrangements 
of cycles. Hence, Gutman and Triantafillou [18] considered a family of banana trees 
of order n. 


Definition 8.2 Let {Ki.n,, Kin.,-.., Kin,} be a set of p stars where p > | and 
n; = 3. The tree obtained by adding a new vertex v and joining v to one leaf of each 
star is called a banana tree. 
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If n; = k for each i, then the tree has pk + 1 vertices and is denoted by B(p, k). 
Let B(n) = {B(p,k) : p => 1 andk > 3}. Thus B(x) is a family of banana trees of 
order n constructed from isomorphic stars. 


Theorem 8.20 (Gutman and Triantafillou [18]) For a fixed number of vertices n, 
the energies of the banana trees B € B(n) decrease as their nullities increase. 


It has been pointed out by Gutman and Triantafillou [18] that if the stars in the 
construction of a banana tree are not isomorphic, then the analytical treatment of the 
nullity-dependence of graph energy as given in Theorem 8.20 is not feasible. 

Another work Gutman et al. [19] investigated the validity of the above conjectured 
property of dependence of &(G) on n(G) by means of computer-aided numerical 
calculations, as applied to a suitably constructed class of graphs. 

Recently, a method for approaching this problem has been put forward in Gutman 
[15, 16]. In Gutman [15] pairs of bipartite graphs G;, G2 have been constructed, 
called “siblings”, such that 7(G;) = 0 and n(G2) = 2 and the difference between 
their characteristic polynomials is constant. The difference between the energy of 
siblings is just the energy-effect of nullity. For non-sibling graphs, the above conjec- 
tured property in general remains open and the proof of its validity to any family of 
chemical graphs would be of significant value for theoretical chemistry. 


8.4 Graphs with Nullity 0 and 1 


In the context of the Huckel Molecular Orbital model, if 7(G) > 0 for the molecular 
graph G, then the corresponding chemical compound is highly reactive and unstable. 
Hence, characterization of graphs for which n(G) = 0 or equivalently n(G) > Oisa 
significant problem and it was first posed in 1957 by Von and Sinogowitz [31]. This 
problem still remains unsolved. In this section, we present some of the significant 
results on graphs with 7(G) = 0 or 1. 


Observation 8.2 From Corollaries 8.1, 8.2 and 8.3, we get 


(i) n(K,) = 0 for all n > 2. 
(ii) 1(P,) = 0 if and only if n is even. 
(iii) _n(C,) = 0 if and only if n#0 (mod 4). 


Ashraf and Bamdad [1] have given infinite families of graphs with nullity zero. 


Theorem 8.21 (Ashraf and Bamdad [1]) Letn > 4 and2<t<n-—1. LetG= 
K, — E(K,,). Then n(G) = 0. 


Theorem 8.22 (Ashraf and Bamdad [1]) Letn and m be positive integers such that 


n>5,[5l1<m< Cs m# (5) —1andm ¥¢ [5] for odd n. Then there exists a 


graph G of order n and size m with n(G) = 0. 
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1 3 r 


Fig. 8.2. The graph G, with n(L(G;)) =r +1. 


Sciriha [30] has given a method for constructing families of graphs with n(G) = 1. 
Consideration of line graphs of trees leads to another family of graphs with nullity 
one. 


Definition 8.3. Let G = (V, E) be graph. Then the graph L(G) with vertex set E in 
which two elements e;, e2 € E are adjacent in L(G) if they are adjacent edges in G 
is called the line graph of G. 


In general, the nullity of line graphs may assume any positive integer. For example, if 
G = mK, then L(G) = K» and hence n(L(G)) = m. Even if we restrict ourselves 
to connected graphs, the nullity of the line graph can still be any positive integer. 
Consider the graph G, given in Fig. 8.2. Then n(L(G,)) =r +1. 


Gutman and Sciriha [17] proved the following theorem. 
Theorem 8.23 Jf T is a tree, then n(L(T)) =O0or 1. 


. 0 if n is odd 
We observe thatif T = P,,, then L(T) = P,_; andhence n(L(T))= di ieee. 
Problem 8.2 [1.] Characterize trees T for which n(L(T)) = 0 or equivalently 
n(L(T)) = 1. 


More generally, the characterization of graphs with nullities 0 and | is a fundamental 
unsolved problem. 


8.5 Graph Operations Preserving Nullity 


Three transformations which preserve nullity have been reported in Cvetkovié and 
Gutman [7] and Ma et al. [24]. 


T,: Let G be any graph and suppose G contains Py = (v1, v2, v3, v4) such that 
deg(v2) = deg(v3) = 2 in G. Let H be the graph obtained from G by removing 
v2 and v3 and joining v2 and v4. Then n(G) = n(AZ). 

T,: Suppose G contains acycle C4 = (v1, v2, V3, U4, v1) Such that deg(v2) = deg(v3) 
= 2. Let H be the graph obtained from G by removing the vertices v2, v3 and 
the edge v; v4. Then n(G) = n(AZ). 
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T3: Let G bea graph with 6(G) = 1. Letdeg(v) = 1 and let u be the unique neighbor 
of v. Let H = G — {u, v}. Then 7(G) = (A). 


Using the above transformations, we compute the nullity of some classes of 
graphs. 

The Cartesian product GLIA of two graphs G and 4 is the graph with vertex 
set V(G) x V(A) and two vertices (v1, v1) and (v2, v2) are adjacent if uj = uz and 
vjv2 € E(A) or vy = v2 and uju2 € E(G). 


2 ifn =2 (mod 3) 
0 otherwise. 


Theorem 8.24 LetG = P,UP, wheren > 2. Thenn(G) = 


Proof If n = 2, then G = C4 and n(G) = 2. If n = 3, the graph obtained by using 
T, is Py. Hence n(P2LP3) = n( Pa) = 0. Ifn = 4, the graph obtained by using 7>, 
T3 1S P4. Hence n(P2 P4) = n(P4) = 0. 

We claim that forn > 5, n(PoUP,) = n(P2U1P,_3). InfactifG = P2UP,, let Gy 
be the graph obtained from G by applying 7. The graph G, has 2 vertices of degree 
1 and we apply 7; for both the vertices of degree 1. The resulting graph is P,LIP,,_3. 
Since the transformations preserve nullity, it follows that (P.O P,,) = n( Po P,,-3). 

Hence the result follows. 


The corona of two vertex disjoint graphs G; and G2 is the graph G = G, 0 Gp 
obtained from one copy of G; and |V(G1)| copies of G2 by joining the ith vertex of 
G; to all the vertices in the i“” copy of G2. 


Theorem 8.25 (G0 K,) = 0 for any connected graph G. 


Proof Let H = Go K,. The graph obtained from H using 73 for any pendant vertex 
u with neighbor v is the graph (G — v) o Ky. 

By repeatedly applying the above transformation (nm — 2) times, we get the graph 
P, 0 K, = P,. Therefore, 7(G o K,) = n(P4) = 0. 


Remark 8.1 Theorem 8.25 shows that every graph can be embedded in a graph with 
nullity zero. Hence there is no forbidden subgraph characterization for graphs with 
nullity zero. Also since the corona of a disconnected graph with K, is the disjoint 
union of the graphs obtained as corona of each component with K, and the adjacency 
matrix of a disconnected graph is the direct sum of the adjacency matrices of the 
components, it follows that Theorem 8.25 is true for disconnected graphs as well. 


Theorem 8.26 Let G be a graph with A(G) =n —1. Let k denote the number 
of vertices with degree n— 1. Let H be a graph obtained from G by attaching 
one pendant vertex at each vertex v with a degree less than n — |. Then n(H) = 
1 ifk=1 


0 otherwise. 


Proof By applying T3 for each pendant vertex, we obtain the complete graph Kx. 
Hence the result follows. 


8 Nullity of Graphs—A Survey and Some New Results 165 


Definition 8.4 A threshold graph is a graph constructed from a one vertex graph by 
repeated applications of the following two operations: 


1. Addition of a single isolated vertex to the graph. 
2. Addition of a single dominating vertex to the graph, i.e., a single vertex that is 
adjacent to all other vertices. 


Theorem 8.27 Let G,, be a connected threshold graph in which v; is added as an 
isolated vertex if i is odd and dominating if i is even. Then n(G) = 0. 


Proof Let V(G,,) = {v1, v2,..., Un} where the vertices are listed in the order in 
which they were added. Therefore, deg(v,) = n — 1 and deg(v,_1) = 1. Applying 
T; for G,, at the vertex v,_1, we obtain G,_2. Repeated application of 73 gives Ko. 
Since, n(K>) = 0, the result follows. 


Definition 8.5 A half graph G, is a graph with 2n vertices vj, v2...v, and 
Uj, Uz,...U, Obtained by joining v; with u; by an edge whenever i < j. 


Theorem 8.28 Let G,, be a half graph of order 2n. Then n(G,) = 0. 


Proof Let V(G,,) = V UU, where V = {v1, v2,..., Un}, U = {uy, u2,..., Un} and 
N(u;) = {v;, vi41,---, Un}. Applying T3 to G,, at the pendant vertex v;, we obtain 
G,,_1. Repeated application of 7; gives Kz. Hence n(G,,) = 0. 


We now define a family of graphs F = {G,, Go,...,G,,...} starting from the 
graphs G, and H given in Fig. 8.3. 

Forn > 2, the graph G, is obtained from G,,_; by attaching the right-most vertical 
side of H to the vertical side of G,_; on the left side. The graph G> is given in Fig. 8.4. 


Fig. 8.3. The graphs G; and S 
. ah 
G1 H 


Fig. 8.4. The graph G2 
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Fig. 8.5 Molecular graph of cholesterol compound 


Theorem 8.29 The nullity of G, is 0 for all Gn € F. 


Proof It follows from the definition of ¥ that the number of cycles in G, is 3k + 1. 

The proof is by induction on n. 

When n = 1, by applying the transformations 7,, 7, T3, 73, T;, T; and T;, to Gj, 
we obtain the graph K3. Hence n(G,) = 0. 

We now assume that the result is true for m — 1. Now by applying 7), 72, 73, 73, 
T3;, T3 to the three consecutive hexagons H), Hz, H3 in G,, we get G,_,. Hence by 
induction the proof is complete. 


The graph G obtained from G, in ¥ by attaching a path P¢ at a vertex of degree 2 
in Cs is the molecular graph corresponding to the cholesterol compound. This graph 
is given in Fig. 8.5. The following theorem shows that the cholesterol compound is 
chemically stable. 


Theorem 8.30 Let G be the graph obtained from G, € F, by attaching a path P, 
at a vertex of degree 2 in Cs. Then n(G) = 0. 

Proof If r is odd, then by applying mt times 73 on the attached path, we get Gy. 
If r is even, then by applying the transformations given in Theorem 8.29, to G,, we 
get the graph isomorphic to C3 together with the path P, attached to C3. Now by 
applying 5 times 73 to the path P, we obtain Kz. Hence n(G) = 0. 


We now define another family of graphs 8 = {G,, Go,..., G,,...} starting from 
the graph G; given in Figure 8.6 and the hexagon C¢. For n > 2, the graph G,, is 
obtained by identifying the vertical edge of the last hexagon in G,_, and the vertical 
edge of Cg. 

The graphs G2 and G3 in BS are given in Fig. 8.7. 

The graphs given in the family 8 represent the benzoacene monoradical series 
given in Dias [9, Fig. 8]. 


Theorem 8.31 Let G, € B. Then n(G,) = 1. 


Proof The proof is by induction on n. Applying three times the transformations T; 
for G;, we obtain the graph H given in Fig. 8.8. 

Since H is a bipartite graph of odd order, it follows that n(H) > 0 and hence 
n(G,) > 0. Now we shall compute the coefficient a¢ in the characteristic polynomial, 
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Fig. 8.6 The graphs G; and C¢ 


G, G 
Fig. 8.7 The graphs G2 and G3 


Fig. 8.8 The graph 7 


ys a; using Sachs’ theorem. Any elementary subgraph of order 6 is either C4 U 
K» or 3K or Co. The number of elementary subgraphs of type C4 U K2 is 6, the 
number of elementary subgraphs of type 3K is 11 and the number of elementary 
subgraphs of type Ce is 4. Therefore, ag = 12 — 11 — 8 = —7. Hence n(H) = 1. 
Hence it follows that 7(G,) = 1. 

For G2, by applying the transformations 7;, Ty and 73, we obtain K; and hence 
n(G2) = 1. 

For n > 3, by applying 7), To, 73, T; on the right side of G,, we obtain G,_2. 
Hence by induction, it follows that n(G,,) = 1. 


Theorem 8.32 Let Gk be the graph obtained from the path P, = (v1, V2, .--; Un) 
by replacing every edge of P, by an even cycle of length C2. Then n(G2*) =n. 


Proof The proof is by induction on k. For k = 2, G? is the graph obtained from P, 
by replacing every edge by C4. There are (n — 1) copies of C4 and by applying the 
transformation 7> for each cycle, we obtain K,. Therefore, n(G4) =n. 
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We now assume that the result is true for k — 1. Consider G2". For each cycle 
Cx in Gk we apply the transformation 7\. This gives the graph Ge), Hence by 
induction, n(G?*) =n. 


Theorem 8.33 Let G,, be the graph obtained from the path P, = (v\, V2, ---, Un) 

with n > 3, by adding n — 2 vertices uy, Uz,...,U,—2 and joining u; with v; and 
_ | 0 ifn is even 

Uj42 by edges. Then n(G,) = : nie add. 

Proof By applying the transformation 7) and 73 to G,, we obtain G,_2. Hence 

n(Gn) = n(Gp_-2). Now G3 = Cy and hence n(G3) = 2. For the graph G4 by apply- 

ing the transformation 72, we get Py. Hence n(G4) = 0. 


O ifn is even 
Hence 7(Gn) = 2 if nis odd. 


8.6 The Nullity Set of a Family of Graphs 


Let F be a family of graphs. The nullity set N(F) of F is defined by N(F) = 
{n(G) : GEF}. 


Example 1 


Let F be the family of all line graphs of a tree. It follows from Theorem 8.23 that 
N(F) = {0, 1}. Thus for an infinite family of graphs, the nullity set may be a finite 
set. 


Example 2 


Let ¥ be the family of all line graphs. If G = pK, then L(G) = Kp and hence 
n(L(G)) = p. Thus for any positive integer p, there exists a graph G such that 
n(L(G)) = p. Hence N(F) = {0, 1, 2,3,...,n,...}. 


Example 3 


Let ¥ = {L(G) : Gis aconnected graph}. Consider the graph G, consisting of r 
cycles of length 4 and edges joining vertices of cycles as shown in Fig. 8.9. 

It can be easily verified that n(L(G,)) =r-+ 1. Hence n(F) = {0, 1,2,..., 
n,...}. 
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Fig. 8.9 The graph G, with n(L(G;)) =r +1. 


Fig. 8.10 Trees with nullity 
n—4 
| 
| | | 
T i 
Fig. 8.11 Trees with nullity 
n—6 
| ' | 
; { 
| 
3 


Let 7,, be the set of all trees of order n. We now proceed to determine N(7,,). We 
need the following theorems. 


Theorem 8.34 (Ellingham [10]) LetT € 7,,. Thenn(T) <n — 2 and equality holds 
if and only if T is isomorphic to the star Sy = K,.n—1. 


Theorem 8.35 (Li [22]) Let T € 7, — {S,}. Then n(T) < n — 4 and equality holds 
if and only if T is isomorphic to one of the trees T; or Ty given in Fig. 8.10. 


Theorem 8.36 (Li[22]) LetT € J — {S,, T,, To}. Thenn(T) <n — 6 and equality 
holds if and only if T is isomorphic to one of the trees T3, Ty or Ts given in Fig. 8.11. 


In general, we can use trees in 7,, whose nullity is n — 6 to determine trees of order 
n with nullity n — 8 Li [22]. Hence, we have the following. 


Theorem 8.37 The nullity set of T,, is {0,2,4,...,n —4,n — 2, n} ifn is even and 
{1,3,5,...,n2—4,n — 2}, otherwise. 
Let U,, denote the set of all unicycle graphs of order n and let 8, denote the set of 


all bicyclic graphs of order n. Xuezhong and Liu [33] have determined the nullity 
set of U,, where n > 5. 
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Theorem 8.38 (Xuezhong and Liu [33]) The nullity set of U,, n = 5, is {0, 1, 2, 
...,n—Ah. 


Hu et al. [20] and Li [22] determined the nullity set of S,. 
Theorem 8.39 (Hu et al. [20], Li [22]) The nullity of B, is {0, 1,2,...,n — 2}. 


8.7. Graphs with Maximum Nullity 


If G is a graph of order n, then n(G) = n if and only if G = Ky. Since 4 = 
0, it follows that there is no graph G with n(G) =n — 1. Thus if G is a graph 
of order n and G # K,,, then 0 < n(G) <n — 2. Hence the natural problem is to 
characterize graphs with maximum nullity n — 2, second maximum nullity n — 3 
and so on. Substantial progress has been made in this direction. Since any isolated 
vertex contributes | to 7(G), throughout this section we assume that G is a graph of 
order n having no isolated vertices. 

Cheng and Liu [4] obtained a characterization of graphs with nullity n — 2 and 
nullity n — 3. 


Theorem 8.40 (Cheng and Liu [4]) Let G be a graph of order n having no isolated 
vertices. Then 


I. n(G) =n — 2 ifand only if G is isomorphic to a complete bipartite graph Ky, n, 
where n, > 0,n2 > Oandn, +n2 =n. 

2. n(G) =n — 3 if and only if G is isomorphic to a complete tripartite graph 
Kny.no.ny Where nj, n2,n3 > Oandn, +n2+n3 =n. 


There are several results on graphs with nullity n — 4. Li [22] obtained a characteri- 
zation of graphs of order n with 6 = 1 and nullity n — 4. Let Gj be a graph of order 
n obtained from K,.; and K,,, by identifying a vertex of K,.; with the center of Kj ;, 
where r, s,t > landr +s +t =n. Let G2 be the graph obtained from the complete 
tripartite graph K,,,, and the star K,,, by identifying a vertex v of maximum degree 
in Kj, with the center of K;,,, where 1,m, p > 1 andi +m-+p+1=n. The 
graphs G7} and G2 are shown in Fig. 8.12. 


Theorem 8.41 (Li [22]) Let G be aconnected graph of order n with minimum degree 
6 = 1. Thenn(G) =n — 4 ifand only G is isomorphic to the graph G} or G3, where 
G? is given in Fig. 5.12 and G3 is a connected spanning subgraph of G2 given in 
Fig. 8.12, which contains K,, as its subgraph. 


Corollary 8.8 (Li [22]) Let T be a tree of ordern and T # K, ,-,. Then n(T) = 
n — 4 if and only if T is isomorphic to one of the trees T, or T, given in Fig. 8.13. 


Chang et al. [3] obtained a characterization of all graphs with rank 4. Since the 
sum of the rank and nullity is 1, this gives a characterization of all graphs with nullity 
n—A4, 
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Fig. 8.12 The graphs G7 and G2 


Fig. 8.13 Trees with nullity 
n—4 
| | | 
q Z 


Definition 8.6 Let G be a graph with V(G) = {v,, v2,..., v,}. Let m = (mm), m2, 
...,M,) be a vector of positive integers. The graph obtained from G by replacing 
each vertex v; with an independent set {ut v?, face v;} and joining v; with vi if and 
only if ujv; € E(G) is denoted by G o m. This graph is said to be obtained from G 
by multiplication of vertices. 


Theorem 8.42 (Chang et al. [3]) Let G be a connected graph of order n. Then 
r(G) = 4 (or equivalently n(G) = n — 4) ifand only if G can be obtained from one 
of the graphs given in Fig. 8.14 by multiplication of vertices. 


Xuezhong and Liu [33] obtained a characterization of unicyclic graphs G of order 
n with n(G) = n — 4. This result is also included in Borovicanin and Gutman [2]. 


Theorem 8.43 (Xuezhong and Liu [33]) Let G be a unicyclic graph of order n 
with n > 5. Then n(G) = n — 4 if and only if G is isomorphic to one of the graphs 
U,, U2, U3, U4 or Us given in Fig. 8.15. 


Li et al. [21] and Hu et al. [20] independently characterized bicyclic graphs of 
order n with n(G) =n — 4. This result has also been reported in the survey paper 
Borovicanin and Gutman [2]. 

Let G be aconnected graph of order n and size m. The graph G is called a bicyclic 
graphifm =n-+ 1. Let %,, denote the set of all bicyclic graphs of order. The graph 
obtained from two cycles Cx and C; by joining a vertex of C; to a vertex of C; by a 
path of length g — | is a bicyclic graph and is denoted by B(k, g,/). Such a graph 
is called an oo-graph. A graph consisting of three internally disjoint u-v paths of 
lengths 7, p and g where /, p, g > 1 and at most one of them is | is called a 6-graph 
and is denoted by P(/, p, q). There are three types of bicyclic graphs. The first type, 
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Fig. 8.14 Graphs for Theorem 8.42 


Fig. 8.15 Unicyclic graphs with n(G) =n — 4. 


denoted by 8°, is the set of all graphs obtained from an oo-graph with trees attached 
when q = 1. The second type 8** is the set of all graphs obtained from an oo-graph 
with trees attached when g > 1. 

The third type, denoted by @,, is the set of all graphs obtained from a 6-graph 


with trees attached. Then 8, = Bi UBT* U6,. 


Theorem 8.44 (Li et al. [21]) Let G be a bicyclic graph of order n in 6,. Then 
n(G) =n — 4 if and only if G is isomorphic to one of the graphs B;,1 <i < 6, 
given in Fig. 8.16. 


Another characterization of graphs with 7(G) = n — 4 is given in Cheng and Liu 
[5]. 

Let G be an r-partite graph with partite sets X,, X2,..., X, where r > 2. Let 
|Xi| = nj. 


Definition 8.7 A chain-like r-partite graph is a simple r-partite graph with partition 
(X,, X2,..., X,) in which each vertex of X; is joined to each vertex of X;,, for 


8 Nullity of Graphs—A Survey and Some New Results 173 


yi? 


4 


Fig. 8.16 Bicyclic graphs with n = n — 4. 


i=1,2,...,r—1. This graph is denoted by Ph nj,...n, Where |X;| =n — i. For 
r>4, PO is the graph obtained from P,, n,.....n, by joining each vertex of X> 


1 My <0e5 ny 


to each vertex of X4 by an edge. For r > 5, the graph P{?) is the graph obtained 


N1,N2,...,Ny 
from Py, n5,...,.n, by joining each vertex of X; to each vertex of X4 and Xs by an edge. 
Forr > 6, the graph PO), is the graph obtained from P®,_,, by joining each 


vertex of X¢ to each vertex of X2 and X3 by an edge. 


Theorem 8.45 (Cheng and Liu [5]) Let G be a connected graph of order n with 
n> 4. Then n(G) =n —4 if and only if G is isomorphic to Ky, .ny,...n,¢ = 4), 
Prinz gees ny (r =4or 5), re (r =4or 5 po . priey ny (r =Sor 6) or Pe csc 


(r = 6), whereny +ng+---+n, =nandn; > Oforalli =1,2,...,7r. 


Li [22] obtained a characterization of all graphs with nullity n — 5. 

Let G3 be a graph of order n obtained from the complete tripartite graph K,..¢ +, 
and K,,, by identifying a vertex of K,.,,, with the center of K,,, wherer,s,t,q > 0 
andr+s+t+q=n. 

Let G4 be the graph obtained from Km,» and K1,q by identifying the vertex 
v of maximum degree in Kj ;,m,) with the center of K,,~, where /,m, p,d > 0 and 
l+m+pt+d+l=n. 


174 S. Arumugam et al. 


Fig. 8.17 Graphs G3 and G4 


Theorem 8.46 (Li [22]) Let G be a connected graph of order n. Then n(G) = n — 5 
if and only if G is isomorphic to the graph G3 or Gi where G3 is given in Fig. 8.17 
and G% is a connected spanning subgraph of G4 given in Fig.8.17 such that G4 
contains Kj,m,p as a subgraph. 


8.8 Conclusion 


In this paper, we have presented a survey of general mathematical results on the 
nullity of graphs along with chemically relevant aspects and some new results. Open 
problems for further investigation have also been included. Though we have covered 
most of the significant results on nullity, this survey is not exhaustive. 
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Chapter 9 M®) 
Some Observations on Algebraic a 
Connectivity of Graphs 


Ravindra B. Bapat, A. K. Lal, and S. Pati 
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9.1 Introduction 


Let G = (V, E) be a connected simple graph on the vertex set V = {1,2,..., 7} 
and the edge set E. For any two vertices i and j,i # j, byi ~ j, we will mean that 
i is adjacent to j, that is, the edge [i, j] ¢ E. The adjacency matrix A(G) of G is 
ann x n matrix with the (i, j)-th entry 1, ifi ~ j and 0, otherwise. The Laplacian 
matrix L(G), in short L, is defined as D(G) — A(G), where D(G) is the diagonal 
degree matrix of G. It is well-known that the Laplacian matrix is a positive semi- 
definite matrix with 0 as the smallest eigenvalue and }F, the vector of all ones, as a 
corresponding eigenvector. Fiedler [7] observed that the second smallest eigenvalue 
of L(G) is positive if and only if G is connected. This eigenvalue is viewed as an 
algebraic measure of the connectivity of G and known as the algebraic connectivity 
of G. It is denoted by a(G). 
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The eigenvectors of L(G) corresponding to the algebraic connectivity are popu- 
larly known as the Fiedler vectors of G. The literature is full of interesting research 
pieces characterizing graphs that have the smallest (or largest) algebraic connectiv- 
ities among certain classes of graphs, see, for example, Fallat et al. [6], Fallat and 
Kirkland [5], Guo [8], Kirkland and Fallat [9], Kirkland et al. [11], Merris [13], Patra 
and Lal [14], Shao et al. [15]. 

The first such results were obtained by Fiedler himself. It follows from results 4.1, 
4.3, and 4.4 of Fiedler [7] that among all trees on n > | vertices, the path has the 
smallest algebraic connectivity and the star has the largest algebraic connectivity. 

Let G be a connected graph and y be a Fiedler vector. A vertex v is called a 
characteristic vertex of G with respect to y if y(v) = 0 and v has a neighbor w such 
that y(w) 4 0. An edge [u, v] is called a characteristic edge with respect to y if 
y(u)y(v) < 0. The collection of all characteristic elements of G with respect to y 
is called the characteristic set of G. Following Bapati and Pati [1], we denote it by 
C(G,y). 

There are many interesting facts known about the characteristic vertices, edges 
and set, see Fiedler [7], Merris [12, 13], Kirkland et al. [11], Kirkland and Neumann 
[10], Kirkland and Fallat [9], Fallat and Kirkland [5], Bapati and Pati [1], Bapati et al. 
[2]. For example, for trees it is known that |C(T, y)| = 1. A general result about the 
cardinality of the characteristic set can be found in Bapati and Pati [1]. It is known 
that, if for some Fiedler vector y, we get C(G, y) = {u}, then for every Fiedler vector 
z of G, we will have C(G, z) = {u}, see Bapat et al. [2]. For a connected graph G, 
if C(G, y) is not a singleton vertex, then it always lies in the same block (a maximal 
induced subgraph without a cut vertex in it) of G, independent of the different Fiedler 
vectors, see Bapat et al. [2]. In Bapat et al. [2], the authors have shown that for any 
connected graph G and any Fiedler vector y, if C(G, y) is an edge then, a(G) is 
simple. 

One of the key techniques in understanding the algebraic connectivity of trees 
is that of a Perron branch, a concept introduced by Kirkland et al. [11]. Let G be 
connected and W © V(G). Then, a branch of G at W is any connected component 
of G — W. Let B be a branch of G at W. Denote by L(B) the principal submatrix 
of L(G) corresponding to B. It can be argued that L(B ) is anonsingular M-matrix 
and hence has an entrywise positive inverse. Let us denote L(B)"! by B[B] and the 
Perron value of B[B] by p(B). 

Let G be a connected graph and u be any cut vertex. A branch B at u is called 
a Perron branch if p(B) is the largest among all branches at uv. Note that this is 
equivalent to saying that p(B) > AG) (see Bapat et al. [2]). If T is a tree and B isa 
branch at u then, an explicit description of B[B] is known (see Fallat and Kirkland 
[5], Kirkland and Neumann [10], Kirkland et al. [11]). For an unicyclic graph, a 
corresponding description can be found in Bapat et al. [2, 3]. 

Using the concept of Perron components the following result was proved. Let T 
be a tree and B be a non-Perron branch at wu. (In this case, it is known that B cannot 
contain the characteristic set.). Let T’ be obtained by moving a leaf in B away from u. 
Then, a(T’) < a(T) (see Kirkland and Fallat [9]), that is, as we move an edge away 
from the characteristic set the algebraic connectivity tends to decrease, see Fig. 9.1. 
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Fig. 9.2. H is obtained from G by sliding C along P one unit away from u = 1 


This happens because the bottleneck matrix B[B’] => B[B] (entrywise domina- 
tion), thereby increasing the Perron value. Notice that, in the above example, we are 
sliding the tree T” along a path, away from u. 

A more general view of the above operation is ‘sliding a connected graph, not 
necessarily a tree, along a path’. Here, we allow the path to contain more than one 
vertex of the graph. We give a formal definition below. 


Definition 9.1 Let G be aconnected graph, u be a cut vertex and B be a non-Perron 
branch at wu. Let u; be the only vertex in B adjacent to u and let P = [w),..., ug] 
be a path in B. Let C be a connected subgraph of B such that V(C) N V(P) = 
{Ux+1, Ue+2,---, U4}. Here, we assume k +r < & and the edges of P that are not 
in C are bridges of P. Replace each edge of E(C) — E(P) of the form [x, u;] with 
[x, uj41], if x is not on P. Replace each edge of E(C) — E(P) of the form [u;, u;] 
with [uj+41, 441]. We call this operation as ‘sliding C along the path P by one unit 
away from u’. 


Example 1 


(i) Consider the graph G in Fig. 9.2. It may be having more vertices on the left 
side. Take u = |. Then, B is the component at | that contains 2. Here P = 
[2,3,4,5, 6, 7], uy = 2, ues) = 4, Ur+, = 6, and ue = 7. The graph C is the 
one inside the dotted curve. The graph H is now obtained by moving C one unit 
away from | along the path P. 
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Fig. 9.3. Left side graph: P(X). Right side graph: P(X) 


(ii) In general, consider a path P = [0, 1, ...,] and a connected graph X on ver- 
tices 7,/+ 1,...,n—1,n+1,...,m (where / > 1) such that X contains the 
path [/,/+1,...,m— 1]. Let P(X) denote the graph P U X and P(X), denote 
the graph obtained by sliding X along P one unit away from 0 (see Fig. 9.3). 
Also, let us denote the isomorphic copy of the subgraph X in P(X); by X}. 
Note that the isomorphism ¢ from X to X, is given by @(i) = i + 1, whenever 
l<i<n-—landd@(i) =ifori>n+1. 


Recall that in Fig. 9.1, we had a(T’) < a(T) owing to the fact that B[B’] > B[B] 
entrywise. We need to make this domination work in general case. This is our main 
result. 


Theorem 9.1 Let the graphs X,G = P(X), X; and H = P(X), be as shown in 
Example I (ii). Then, B[G — O] is entrywise dominated by B[H — 0]. 


In order to do the above, we first give a less known combinatorial description of 
the bottleneck matrix and some of its immediate useful implications. This is done 
in Sect. 9.2. In Sect.9.3, we prove our main result and use it to obtain comparisons 
between the algebraic connectivity of a few graphs. 


9.2 Describing the Bottleneck Matrix 


For a square matrix A, by A(ij,..., ix|ji,..-, je), We mean the submatrix obtained 
by deleting the rows i), ..., i, andthe columns j,,..., jx, of A. Recall that the matrix- 
tree theorem says that det L(i|) = (—1)'t/«(G), where « (G) is the number of span- 
ning trees in G. Lets, t > 1. Then, by a forest of G of the type (i), ..., is\|j1,---5 Jr), 
we mean a spanning forest of G having two trees, one containing the verticesi,,..., is 
(and may be some more vertices) and the other containing the vertices j,,..., j; (and 
may be some more vertices). By NG(i,.,...,i,||,j,,...,j,)» We denote the number of spanning 
forests of G of the type (i1,...,is||j1,.-., Jr). With the above notations, we have 
our first result which is a direct application of Chaiken’s result, see Chaiken [4]. 


Lemma 9.1 Let G be a connected graph on vertices 0,1,...,n and 0 <i, j (we 
allow i = j). Then, det L(O, i|0, 7) = (—L)'T/ New, j- 
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Proof Form a complete loop-less directed graph H from G by replacing each undi- 
rected edge [i, j] in G with two directed oppositely oriented edges (i, j) and (j, i) 
of weights | each. For each pair {i, 7} which is not an edge in G, draw the edges 
(i, j) and (j,i) with weights 0. 

Take S = {0,1,...,n} = V(A). Then, the matrix A as defined in Eq. (1) of 
Chaiken [4] is nothing but L(G). 

Now, take W = {0, i} and U = {0, j}. Then, «(W, S) = (—1)'“! and €(U, S) = 
(—1)/—!, see the lines below Eq. (3) on page 321 of Chaiken [4]. 

Now, take A= W ={l,...,i—1,i+1,...,n} and B=U ={l,...,j—1, 
jJ+1,...,n}. Then, for any matching z : A > B, we geta path from j toi and this 
is the only path defined by z (see the first paragraph of Section 2 of Chaiken [4]). 
Thus, in our case, the bijection z* : {0,i} > {0, j} is w*(O) = 0 and 2*(i) = j. 
Note that z* has no inversions. Hence €(z*) = 1. 

Note that in our notation A(W|U) is nothing but A(W|U) in Chaiken [4]. Thus, 
from the all minors matrix-tree theorem in Chaiken [4], we have 


det A(W|U) = €(W, S)e(U, S) } e(ar*)ar, 
F 


where the sum is over all spanning forests F in H such that the following holds: 


1. F contains exactly |W| = |U| = 2 trees. 

2. Each tree in F contains exactly one vertex in U and exactly one vertex in W. 
That is, one tree 7) contains the vertex 0 and the other tree, say 7;, contains the 
vertices i and j. 

3. Each arc in F is directed away from the vertex in U of the tree containing that 
arc. This means that 7p is rooted at 0 and all its arcs are directed away from 0 and 
hence it may be viewed as an undirected tree. The same holds for the tree 7). 

4. af is the product of the weights of the edges in F. 


In our case, note that ay = 1 if F corresponds to a spanning forest in G, otherwise 
ar = O. Therefore, 


det L(0, i|0, j) = det A(WIU) = (-D)? ag = (-1)* NGqou.,;). 
F 


The proof is complete. 


We remark here that, one could directly give a proof of the previous result using 
the properties of the vertex edge incidence matrix. We have chosen the above proof 
as it is a direct application of the theorem in Chaiken [4]. 

As an immediate corollary, we present a description of the (i, j)-entry of a bot- 
tleneck matrix. 


Corollary 9.1 Let G be a connected graph on vertices 0, 1,...,n andO <i, j (we 
allow i = j). Then, 
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BIG — Of, j) = ae 


Proof Note that L(0|0) is symmetric and B[G — 0] = L(0|0)~!. Hence, 


eS ;,detL(0,i|0, 7) — Newi.j) 
RG—0/G, p=] = 
1G, = CD get LOO) «(G) 


as claimed. 


We use Corollary 9.1 to obtain quite a few well-known results. To proceed with 
our first result, let G be a connected graph and let B be a branch at a vertex u. As 
the matrix L(B) is a diagonally dominant nonsingular M-matrix, its inverse has all 
positive entries and moreover (L (B))ii = (L(B)); j> for all i A j. A combinatorial 
explanation of this is stated next. 


Corollary 9.2 Let G be a connected graph on vertices 0,1,...,n and let i > 0. 
Then, B[G — 0](i, i) => B[G — 0](i, j), for each j > O with j # i. The inequality is 
Strict if there is a forest of the type (0, j||i). 


Proof Using Corollary 9.1, we have 
k(G)B[G — 0]@, i) — k(G)B[G — 0]G, Jj) = Neo — Neoui,p- 


Now, notice that each forest of the type (0||i, 7) is also a forest of the type (O||i). 
Hence Neo) — Necli,j) = 0. If there is at least one forest of the type (0, j||i) 
then, its contribution to Ngoi) — NGyii,j) is at least 1 and hence, in this case, the 
inequality is strict. 


Our second corollary is the famous result in Kirkland et al. [11] that gives the 
description of the bottleneck matrix of a tree in terms of common edges. 


Corollary 9.3 (Kirkland et al. [11]) Let T be a tree on vertices 0,1, ...,n. Then, 
B[T — O](, j) is the number of edges common to the 0-i-path and the 0-j-path. 


Proof Note that for a tree T, a spanning forest of the type (O||i, j) can only be 
obtained by deleting acommon edge of the 0-i-path and the 0- j-path. Hence, Nz (ji, ;) 
equals the numbers of such common edges. 


As our next corollary, we obtain a description of the bottleneck matrix of a cycle. 


Corollary 9.4 (Bapat et al. [2]) Let G be the cycle [0,1,2,...,n, 0) andO0 <i < j. 
Then, 
im+1-—-j)) 


BIG -—D@G j= @oD 
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Proof A spanning forest of the type (O||i, 7) can be obtained only by deleting an 
edge from the 0-i-path that does not contain j and by deleting an edge from the 
0-j-path that does not contain 7. Thus, there are only i(n + 1 — j) such spanning 
forests. As k(G) = n + 1, the required result follows. 


To state and prove the next corollary, we need the following notation. By A(:, i) 
(respectively, A(i, :)), we mean the i-th column (respectively, row) of A. By J, we 
mean the matrix of all ones of an appropriate size. 


Corollary 9.5 (Bapat et al. [3], Fallat and Kirkland [5]) Let G and H be two 
connected graphs on vertex sets 0,1,...,n andn,n+1,...,k, respectively. Put 
56 = B[G — 0](m, n). Then, 


meu a=| BIG — 0] Fie — ei. 


BIG — O\(n, )| J + BLH —n] 
Proof Note that V(G) N V(A) = {n}. Hence 

Nouri.) = Nei, DD), forl <i, j <n. 
So, 


.  — Neu olli,i) — Nea KD — Neolli,jy _ a 
BIGU H — 0], j) = «(GUH) ~ «(G)-e(H) =~ «(G) = BIG — O]G@, 7). 


If 1 <i <nandj > n then, aspanning forest of G U H of the type (0]|/, 7) uniquely 
gives rise to a spanning forest of G of the type (O||/, 7) and a spanning tree of H. 
Hence, 


BIGUH - 0G, j) = NGuH (Olli, j) - NGOljin) *-K(A) _ NGOlli,n) — BIG — 0]. n) 
; K(GUA) K(G)-k(H) K(G) oe 


Now, let j > i > n. Then, a spanning forest of G U H of the type (O||i, 7) gives rise 
to 


1. either a spanning tree of G and a spanning forest of H of the type (ni, /), 
2. or a spanning forest of G of the type (O|n) and a spanning tree of H. 


Thus, 
Noun.) — K(G) Nami.) + Neonyk A) 


K(GUH) «(G)k(H) 
= B[G — 0](n,n) + B[A — n](i, j) = 6+ Bl[A —n](i, j). 


BIGU A — O](i, j) = 


This completes the proof. 


We are now ready to state and prove the next corollary. 
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Corollary 9.6 (Bapat et al. [3] (Shifting)) Let Go, G1, G2, H be connected graphs 
on vertices {0,1,...,n},{n,n+1,...,k}, {k,k +1,...,r},and{n,r+1,---,s}, 
respectively. Let H* be obtained from H by relabeling the vertex n to k (see the figure 
below). Then 


B[Go UG; UG2UH — 0] < B[Gp UG; UG, U H* — 0). 


Ch | 


G=G)UG,UG,UH G* =G)yUG,UG,UH" 


Proof Put G= GoUG,UG2U4 and G* = Gp UG; UG2U A. As k(G) = 
k(G*), using Corollary 9.1, we have 


k(G)B[G — 0] (i, 7) — K(G)B[G* — 0], j) = Neu) — Ne«o,j)- 
Using Corollary 9.5, we see that 
B[Go U G; U G2 — O](n, n) < B[Go U G, U G2 — OJ, k). 


Thus, again using Corollary 9.5, we only need to show that Ng @i,j) < NG«clli,j)> 
whenever i is a vertex of Go U G, U G2) and j isavertex of H. 


We do so as follows: 


Case 1. Assume that i € {1,2,...,}. Then, a spanning forest of G of the type 
(Olli, 7) is a spanning forest of Go of the type (0||Z, 7) and spanning trees of H, G, 
and G). A similar argument holds for Ng*«(oji, ;). Hence, in this case, 


Nooo.) = Neon) k (A) (G1)K (G2) = Ne«olli,j)- 


Case 2. Assume that i € {n+ 1,n+2,...,k}. Then, a spanning forest of G of the 
type (O||i, 7) is a spanning forest of Go of the type (O||7) and spanning trees of 
H, G, and G>. So, 


Nooo.) = NGool\nyk G2) kK (G1 )k (G2). 


Now, observe that a spanning forest of G* of the type (O||i, 7) gives rise to either 


1. a spanning forest of G, of the type (n||i, k) and spanning trees of Go, G2 and 
H*,or 
2. aspanning forest of Go of the type (0||m) and spanning trees of G;, G2 and H*. 
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Ne=(oli,7) = K(Go) Neri, G1) K (G2) + Neo oinyk (A) kK (G1) Kk (G2) 
= NG on) k GH) Kk (G1)k (G2) = Neoli,j)- 


Case 3. Assume that i € {k +1,k+2,...,r}. The argument is similar to Case 2 
and hence is omitted. 


Therefore, combining the three cases given above, the required result follows. 

Some new observations related to bottleneck matrix that will be used in the sequel 
are given next. The first one implies that if 7 and j are adjacent then, the entrywise 
difference of row i and row j is bounded above by 1. Their proofs are given by 
applying Corollary 9.1. 


Proposition 9.1 Let G be a connected graph on vertices 0,1,...,n,0 <i, j such 
thati ~ j and k > 0. Also, let N, be the number of spanning trees of G that have 
paths of the form [0,..., j,i, ...,k] and Nz be the number of spanning trees of G 
that have paths of the form [0,...,i, j,...,k]. Then, 


Noo, {ili — Noo — Ni — No 


B[G — 0](i, k) — BIG — 0](j, k) = K(G) K(G) 


In particular, |B[G — 0]G, k) — BIG — 0]G, k)| < 1. Moreover, equality holds if 
and only if either Ny = k(G) and No = 0 or N; = 0 and Nz = k(G). Or equiva- 
lently, equality holds if and only if [i, j] is a bridge that separates the vertices 0 and 
k. 


Proof Using Corollary 9.1, we have 
«(G)B[G — 0]@, k) — k(G)BIG — 0), k) = Neo) — Neollj.e- 


We first note that only the forests of the type (0, j||i, k) contribute 1 to Ng @i,« — 
NGool|j,4) Similarly, we see that the forests of the type (0, i||j, k) contribute —1 to 
NG oii. — Noi. 

Hence, NGolli,&) = NGOlli,&) = NGO, jlli,k) = NG@,i|lj,4): Thus, we only need to 
show that the above number is < x(G). 

Toward that, we will show that Ngo, jj|i,4) 1s the number of spanning trees of G 


that have paths of the form [0,..., j,i,...,k]. For that, take a forest of the type 
(0, j||i, &) and add the edge [i, j] to get a spanning tree having a path of the form 
[0,...j,i,...,k]. Conversely, if we take any spanning tree that contains a path of 


the form [0,..., j,i,..., k] then, the forest obtained by deleting the edge [i, j] isa 
forest of the type (0, j||i, k). 

Similarly, NG @,ij|,4) 18 the number of spanning trees of G that have paths of the 
form [0,...,7, j,...,k]. The conclusion follows. 
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Proposition 9.2 Let G be a connected graph on vertices 0,1,...,n and 0 < 
i, j,u, v such that i ~ j and u ~ v (some of the vertices i, j, u, v need not be dis- 
tinct). Then, 


NG(,v||i,u) + NGO, jliu.v) — NG@,i||j,v.u) — NGO,ullj.v) 


B(G — 0], uv) — BIG — O](j, v) = KG) 


In particular, 


B[G — 0]G, uv) — BIG — O](j, v)| < 1. 


Proof By Corollary 9.1, 
«(G)B[G — 0]@, u) — K(G)BIG — O](j, v) = Ne@jliuny — Neojn- ©.) 


Note that a forest of G of the type (O||i, u) is 


1. either a forest of the type (0, v||i, u), 
2. or a forest of the type (0, j||Z, u,v), 
3. or a forest of the type (O||i, u, j, v). 


The forests of G of the type (0 /, v) can be partitioned similarly. So, Eq. (9.1) reduces 
to 


«(G)(BIG — O] (i, u) — BIG — 0], v)) = NG@,vI\i,0 + NGO, jlli,u.v) — Ne,ulj.v) — No,ilj,v,0- 
This completes the proof of the first part. 


For the proof of the second part, note that only the forests of the type (0, j||i, u) 
or (0, v||?, w) can contribute 1 to NG jiu) — Nell j,v)- Hence, we consider two cases 
depending on whether we have forests of the type (0, j||i, w) or (0, v||i, w). 


Case 1. If i = u and j = v, then by adding the edge [i, j] to a forest of the type 
(0, j||i), we obtain a spanning tree of G. We are done in this case. 

Case 2.1 Either i # u or j # v. Suppose that we have forest of the type (0, j||i, u). 
By adding the edge [i, 7], we obtain a spanning tree of G. Note that in this tree, the 
0-u-path contains both the vertices i and j. 

Case 2.2 Either i 4 u or j # v. Suppose that we have forest of the type (0, v||i, u). 
By adding the edge [u, v], we obtain a spanning tree of G. Note that in this tree, the 
0-u-path cannot contain both i and j, unless i = u and j = v. So, these trees are 
different from the ones obtained in the previous case. 


Thus, 
follows. 
As a last observation, we have the following result. 


NGOli.u) — Nooij,»)| <«(G) and hence using Eq. (9.1), the conclusion 


Corollary 9.7 Let G be any connected graph on0,1,...,n. Assume that0 ~ 1 and 
j = 1. Then, B[G — O](1, j) < 1. Equality holds if and only if every path from j to 
0 passes through 1. 
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Proof Note that by Corollary 9.1, «(G)B[G — 0](1, j/) = Neoi,j;). By inserting 
the edge [0, 1], we see that, NGoj1,;) < «(G). Hence, we get the first part. 

Suppose that every path from 7 to 0 passes through 1. Then, the edge [0, 1] is a 
bridge and Ne cojl1) = NGcollt,j)- But Neo) = K(G) and hence, 


— Neon, — Neon _ 


B[G — 0], j) «(G) «(G) 


iB 


Now, let us assume that B[G — 0](1, j) = 1. Hence, «(G) = Nei, j). We also note 
that from every forest of the type (O||1, 7), we can create a unique spanning tree by 
adding the edge [0, 1]. As «(G) = Nei, j), we see that there cannot be any other 
spanning tree. In other words, every path from j to 0 passes through 1. 


9.3. The Sliding Principle and its Application 


We are ready to prove our main result which is stated again. 


Theorem 9.2 Let the graphs X,G = P(X), X, and H = P(X), be as shown in 
Example I. Then, B[G — 0] is entrywise dominated by B[H — 0]. 


Proof Note that we need to show that B[G — O](i, 7) < BLAH — O](, j), foralli, 7 > 
1. To do so, we divide the proof into five cases depending on the location of the vertices 


i,j. 
Case 1. Let i, j <1. Then, by Corollary 9.5, we have B[G — 0](, j) = B[H — 


O}G, J). 
Case 2. Let i </ and j > /. Then, by Corollary 9.5, 


B[G — 0]G, j) = BIG — 0](i, ) = BLA — 0], 1) = BLA — O]G, J). 
Case 3. Let i = n. We divide this case into three subcases. 
Subcase 1. Let j = n. Then, using Corollary 9.5, we have 


BIG — 0](n,n) =1+B[X —J] —1,n—1) +1 
=!1+1+B[X,—- (+ 1)]@,n) = BIA — O](7,n). 


Subcase 2. Let j > n. Then, using Corollary 9.5, we have 


BIG — 0), j) =1+ BX —1@— 1,7) 
<1+B[X -Jm-1,j)+1 
=/+1+BixXi-C+)1@, J) 
= BIH — 0], j). 
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Subcase 3. Let / < j <n. Then, note that j ~ 7 — 1 and j > 2. Thus, using 
Corollary 9.5 and Proposition 9.1, we have 


BIG — 0](7, j) =/+ BLX —/](™—1, j) 
</+B[X -—IJm-1,j-ND+1 
=1+1+B[X, -(@+)]@, j) = BIA — 01, J). 
Case 4. Let 1 <i<n, j >/ and j #n. Then, note that B[G — 0]G@, 7) =/+ 
B[X — /](, j). 
Subcase 1. Let i = / + 1. Then, by Corollary 9.7, BLX —/](+ 1, 7) < 1 and 
by Corollary 9.5 
BIA —-—O)¢4+1,f)=!4+12>/+BX—-04+1, 7) = BIG —0]¢4+1, /). 


Subcase 2. Let? + 1 <i<n,j >1+1andj 47. First, let j > n. By Propo- 
sition 9.1, BLX — /](@, 7) — BLX —/]G — 1, j) < 1. Thus 


BIA —0]G@, 7) = +14 BX - C+ DIG S/ 
=/+1+B[xX —/]G@—-1,/) 
= 1+BlX —1]@, J) = BIG — O1G, J). 


Now, for/+1<i< j <n, we have B[X —/]@, j) —BLX —J]G@-1,j 
1) < 1 (using Proposition 9.2). Thus, 


BIA — 0G, jf) =2+1+BXi-U+ DIG SA 
=/4+1+B[X -G@-1j-) 
= 1+BlX —1]@, J) = BIG — O]G, J). 
Case 5. Let n <i < j, that is, both i, j are outside P. Then, using Corollary 


9.5, we see that B[H — 0](i, 7) =7 +1+B[X, —@+ D]G, j) =1+14+ BX — 
NG, 7) = BIG — 0]G@, 7) +1. 


This completes the proof of our main result. 
As direct applications of Theorem 9.1, we state a few important results without 
proof. 


Corollary 9.8 Let G and H be graphs as shown in the following picture. Then, 
B[G — 0] is entrywise dominated by B[H — 0]. 


oe o—« 'e—- 
0 | I {r) a | IT l+ ce) 


Graph G: Y is a connected subgraph. Graph H: Y is a connected subgraph. 
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Corollary 9.9 Let G and H be graphs as shown in the following picture, where 
Xn—k-1 is an isomorphic copy of X obtained by sliding X exactly n — k — I times. 
Then, B[G — 0] is entrywise dominated by B|H — 0]. 


Graph G. Graph H. 


As mentioned in Sect.refsec:1, it was proved in Bapat et al. [2] that if G is 
a connected graph and y is a Fiedler vector with |C(G, y)| = 1 (the size of the 
characteristic set) then, either C(G, y) = {u}, foracut vertex u orC(G, y) = {[u, v]}, 
for a bridge [u, v]. Then, a repeated application of Theorem 9.1 (or that of the above 
two corollaries) gives us the following result. 


Theorem 9.3 Let G be a connected graph and let y be a Fiedler vector with 
|C(G, y)| = 1. Also, assume that G has r trivial blocks (Kz) and k nontrivial blocks 
By, ... By. Then, there is a graph H which has an induced path of length r and whose 
nontrivial blocks are B,,..., By such that a(H) < a(G). 


—— 
r 


By definition, a tree has no nontrivial block and hence as a final corollary to 
Theorem 9.1, we obtain the following well-known result which was first stated and 
proved by Fiedler. 


Corollary 9.10 ([7]) Let T,, be a tree on n vertices. Then, a(T,) > a(Py). 
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Chapter 10 M®) 
Orthogonality for Biadjoints cro 
of Operators 


T.S.S. R. K. Rao 


10.1 Introduction 


Let X, Y be, real, non-reflexive Banach spaces. An important problem in operator 
theory is to consider standard Banach space theoretic operations on the space of 
operators, £(X, Y) or its subspaces, like the space of compact operators, weakly 
compact operators, etc., and see if the resulting structure is again (or in ) a space 
of operators, perhaps between different Banach spaces. For example, one important 
property that has attracted a lot of attention recently (and also has some bearing on 
the current investigation), is the quotient space problem. For Banach spaces X, Y, Z 
with Z C Y, in the space of operators £(X, Y), when one performs the ‘quotient 
operation’ with the subspace £(X, Z), a natural question is to ask, when is the 
quotient space £(X, Y)/L(X, Z), surjective isometric to a space of operators? Here 
a natural target space is £(X, Y/Z). If m : Y > Y/Z denotes the quotient map, 
then a possible isometry between the spaces involved is, [T] — 2 o T where the 
equivalence class, [T] € £(X, Y)/L(X, Z). See Monika et al. [7], Botelho et al. [3] 
and Rao [10]. 

In this paper we investigate similar aspects for the Banach space theoretic property 
of Birkhoff-James orthogonality and the operator theoretic property of taking biad- 
joints of an operator. Let Z C Y beaclosed subspace, let y € Y. We recall that y | Z 
in the sense of Birkhoff-James, if ||y + z|| => ||y|| all z €¢ Z. Deviating from standard 
notation, we denote the set of all such y by Z*. For aclosed subspace M C L(X, Y), 
with T | M, weare interested in the question, when is T** orthogonal to the smallest 
weak*-closed subspace of £(X**, Y**), containing {S** : S € M}. 
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Let Z C Y be a closed, non-reflexive subspace. We consider the case M = 
L(X, Z). We use the canonical embedding of a Banach space X in its bidual X** and 
also recall the identification of the bidual of the quotient space (Y/Z)** as Y**/Z++ 
and Z** as Z++ Cc Y**. Using these embedding, we have, d(y, Z) = d(y, Z+*+) for 
all y € Y. For T € £L(X, Y), the biadjoint, T** € £(X**, Y**) and when T is Z- 
valued, T** takes values in Z++ (we also note for a future use, when T is compact, 
T** takes values in Z). 

So in this case our problem can be formulated as, if T  M, when is T** L 
L(X**, Z++)? We note that £(X**, Z++) is closed in the weak*-operator topology 
of £(X**, Y**), and we show that this is the smallest closed subspace containing M. 
So that by solving this orthogonality problem, we have a solution to the biadjoint 
question proposed at the beginning, somewhat analogous to the von Neumann double 
commutant theorem for C*-algebras (interpreted as concrete identification of the 
closure of a space, in the weak-operator topology) . We refer to Diestel and Uhl [4] 
for standard results on spaces of operators and tensor products. 

This paper is an expanded version of my talk given at ICLAA2021, organized 
by CARAMS, MAHE, Manipal during December 15-17, 2021. I thank the Editors 
of this special volume for their efficient handling of the MS during the pandemic. 
Thanks are also due to the referee for the extensive comments. 


10.2. Main Results 


For a Banach space X, we denote the closed unit ball by X, . Following the lead of the 
well-known Bhatia-Semrl theorem Bhatia and Semrl [2] on point-wise orthogonality 
of two orthogonal operators (established in the case of finite dimensional Hilbert 
spaces), one can ask, if T 1 £(X, Z), does there exist an extreme point A € X}* 
such that ||7'|| = ||7**|| = ||T7**(A)|| and T**(A) L Z14? 

Most generalizations of the Bhatia-Semrl theorem for Hilbert spaces or for oper- 
ators between Banach spaces (see the survey article Paul and Sain [8]) require T 
to attain its norm, which is a strong requirement even when X is reflexive. On the 
other hand, it is well-known that if T is a compact operator, T** attains its norm 
at an extreme point and also a result of Zizler [12], asserts that {T € L(X,Y): 
T* attains its norm} is norm dense in £(X, Y). Thus anticipating T** to attain its 
norm is a more natural condition for non-reflexive Banach spaces. Now we propose 
a general interpretation of the Bhatia-Semrl theorem as * when does global behavior 
on an operator, leads to a similar local behavior at points where the biadjoint attains 
its norm?’ Our generalization here involves orthogonality of an operator to a space 
of operators. See Rao [11] for another interpretation. 

We will first investigate the sufficient condition, leading to very interesting min- 
imax formulae, see condition (iii) in Proposition 10.1 below. It is well-known and 
easy to see that y | Z if and only if d(y, Z) = |ly|l. 
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Proposition 10.1 Suppose ||T|| = ||T**|| = \|T**(A)|| for some A € X}* and 
T**(A) L Z++. The following hold. 


(i) T** L £L(X**, Z++) as well as T L £L(X, Z). 
(ii) Now using the distance interpretation of orthogonality for subspaces, in this 
case we have, 
7] = IT" = a7, LOX, Z)) = d™, LOX, Z)). 
(iti) d(T**, L(X*™, Z**)) = suprexw {d(T (t), Z**)} = d(T*(A), Z**). 
Proof To see (i) and (ii), for any S € £L(X**, Z++), ||T** — S|] = ||T* (A) — 
S(A)|| = ||T**(A)|| = ITI, since SCA) € Z++. Thus T** 1 £(X**, Z++) as well 
as T | £(X, Z). Now using the distance interpretation of orthogonality for sub- 
spaces, in this case we have, 
7 = 17" =a, LOX, Z)) =a T™, LOX™, Z**)). 
To see (iii), by the distance interpretation of orthogonality again, 
d(T (A), Z°*) = IIT (A) || < suprexp (A(T), ZY}. (10.1) 
Fix S e £(X**, Z++) and t € X**. 
d(T**(r), Z™*) < IT*@) — S(@)|| < IT — SI. 


Therefore suprexe{d(T**(t), Z*++)} < d(T*, Z*+). Thus by (10.1) and (ii), we 
get (iii). oO 


Remark 10.1 We note that the conclusions of Proposition 10.1 are also valid if we 
replace bounded operators by compact operators. In Proposition 10.1, suppose T is 
also a compact operator. Since T**(A) L Z, we also have, 
|T"*(A)|| = d(T (A), Z). 

Remark 10.2 It is of independent interest to note that, for T € £L(X, Y), if 

d(T, £(X, Z)) = supyex {d(T (x), Z)}. 
then again (iii) holds. To see this note that d(T (x), Z) = d(T**(x), Z++). Now 

ai LOZ) Sd LUG) 
and it follows from the proof of (iii), 


suprex {d(T (x), Z)} < suprexe{d(T™ (t), Z*)} < a(T™, Z*). 
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Thus (iii) holds. 


Our aim in this paper is to partially solve the converse problem, for all domains 
X and imposing conditions only on the pair (Y, Z). 

Since Proposition 10.1 is a solution to an optimization problem (see Rao [9]) on 
the weak*-compact convex set X7*, we next show that if there is a solution, then 
there is an extremal solution. We recall that for a compact convex set K, E C K is 
an extreme set, ifforx, y € K,A € (0, 1),Ax + 1 —A)y ©€ E implies x, y e E. An 
extreme convex set is called a face. For x € E, if face(x) denotes the smallest face 
of K, containing x, then clearly E = Uf{ face(x) : x € E}. See Alfsen [1] page 122. 
We apply these ideas to K = X}*, equipped with the weak*-topology, along with a 
Krein-Milman type theorem to get the extremal solution. 


Proposition 10.2 Let A € X}* be such that 
(7 = TPO and supexe (d(T), 2 )} = dP (A), 2). 


Then there is an extreme point of X;*, satisfying the same conditions. 


Proof Let E = {t € X}* : ||T* (to) || = ||" || and suprexs {d(T (t), Z**)} = 
d(T**(t,), Z++)}. It is easy to see that being an intersection of two extreme sets, 
E is an extreme subset of X{*. Now we use a result of T. Johannesen (Theorem 
5.8 in Lima [6]), which says that if the complement of E, E° in X}* is a union of 
weak*-compact convex sets, then E has an extreme point of X;}* . Note that 


: 1 
E®=U,{t! € XP IT" (vy || < ITI — 7) 


1 
Un{t! € Xs d(T (r'), Z**) < suprexe {d(T (tr), Z**)} — a 


We next note that if : Y + Y/Z isthe quotient map, then z** : Y** > Y**/Z++is 
the canonical quotient map and d(T**(t), Z tt) = ||z**(r)||. Thus by the convexity 
and weak*-lower-semi-continuity of ||. || and the weak*-continuity of the adjoint map, 
we see that all the sets in the union are weak*-compact and convex. Therefore E has 
an extreme point of X7*. Oo 


We recall that £(X, Y**) is the dual of the projective tensor product space 
X ®z Y*. See Diestel and Uhl [4] Chapter VIII. Our next Lemma is folklore 
and is included here for the sake of completeness. We use the canonical iden- 
tification of Y*/Z+ = Z* and m : Y* > Y*/Z+ denotes the quotient map. For 
A € X™ and y* € Y*, A ®2(y*) denotes the (weak*-continuous functional) on 
L(X**, Z++), defined by (A @ 2(y*))(S) = S(A)(*) for S € L(X*, Z++). Thus 
(A @ x("))(T*) = A(T*(r(y"))), for any T € L(X, Z). 
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Lemma 10.1 Let Z C Y be a closed subspace and let M = {T™* : T € L(X, Z)}. 
Then M is weak*-dense in L(X**, Z++) = (X** @, Y*/Z+)*. 


Proof Supposefor A € X** and y* € Y*,(A @ z(y*))(T™)=0 forall T € L(X, Z). 
For any z € Z choose y* such that z(y*)(z) = 1. For x* € X* and for the operator 
T* =z @x*, we have A(T*(2(y*))) = 0 = A(x*). Therefore A = 0. Hence by an 
application of the separation theorem for the weak*-topology, M is weak*-dense in 
L(X*, Z++), Oo 


We recall that if 2 : Y — Y/Z is the quotient map, then z* is the inclusion map 
of Z+ in Y*. In what follows, we use a contractive projection to achieve the required 
compact ‘lifting’. See Rao [10] for other alternate approaches. 


Theorem 10.1 Let X be a Banach space and let Z C Y be aclosed finite codimen- 
sional subspace such that there is a contractive linear projection P : Y — Y such 
thatker(P) = Z.LetT € L(X,Y)andT 1 L(X, Z). Then there is an extreme point 
A € X* such that \|T || = ||T**|| = ||T**(A)|| and T**(A) L Z*+. 


Proof Since Z is of finite codimension, 7 o T : X — Y/Zisacompact operator. It is 
well-known (see Harmand et al. [5] page 226) that there are extreme points A € X73", 
y* Zt such that |x oT || = A(T*(y*)) = T*(A)(y*) < d(T*(A), Z14) < 
suprexe{d(T™(t), Z**)} < d(T™, LIX™, Z**)) < d(T, L(X, Z)) = |IT I. 
Nowlet P’ : Y/Z — Y defined by P’(r(y)) = P(y) be the canonical embedding. 
Let S = P' oa oT.Then||S|| < ||z 0 7 ||. Also note that z o S = 2 0 T. Tosee this 
note that for any x € X, S(x) — T(x) = P’(x(T(x)) — T(x) = P(T(x)) —T(x) € 
Z. Thus ||S|| = ||w o T|| and d(T, L(X, Z)) = d(S, L(X, Z)) < ||S|| = |lz o TI. 
Therefore equality holds in the inequalities and we have the required conclusion. 0 


Remark 10.3 We have used the finite codimensionality to get 7 o T compact. The 
theorem also holds by assuming z o T is compact. Also note that since P is a con- 
tractive projection, ||(x — P(x)) — P(y)|| = ||P©)|| forallx, y € X,soin Theorem 
10.1, the range of P is contained in Z * As noted in Remark 10.1, if one considers 
these questions only in the class of compact operators, the minimax formulae from 
Proposition 10.1, are still valid. It was shown recently in Rao [11] that these can be 
used to compute the directional derivative of the operator norm at these operators. 
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11.1 Introduction 


In this paper, we consider the partitioned linear model y = X18, + X28, + ¢, or 
shortly denoted as 


M2 = ty, XB, V} = {y, X18; + XoBo, V}. (11.1) 
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In (11.1), y is an n-dimensional observable variable, and ¢ is an unobservable 
random error with a known covariance matrix cov(e) = V = cov(y) and expec- 
tation E(e) = 0. The matrix X is a known n x p matrix, i.e., X € R”™?, partitioned 
columnwise as X = (X; : Xo). Vector B = (B', B)’ € R” is a vector of fixed (but 
unknown) parameters; here symbol ’ stands for the transpose. We will also denote 
h= XB, hj = XiB;, i= 1,2. 

As for notations, the symbols r(A), A~, A*, @(A), and @(A)* denote, respec- 
tively, the rank, a generalized inverse, the Moore—Penrose inverse, the column 
space, and the orthogonal complement of the column space of the matrix A. By 
A“, we denote any matrix satisfying €(A+) = @(A)+. Furthermore, we will write 
Pa = Peay = AA* = A(A‘A) A’ to denote the orthogonal projector onto @(A) 
and Q, = I— Px, where I is an identity matrix of conformable order. In particular, 
we denote 


M=[I,-Px, M=1L,-P;, P; =Px,, i=1,2. 


In addition to the full or big model 42, we will consider the small model “ = 
fy, X18,, V} and the reparametrized model 


Mn = ty, XB, V} = ty, X18; + MiX2B., V}, X, = CX : M) Xp). 


Under the model .42 , the statistic Gy, where G is ann x n matrix, is the best 
linear unbiased estimator, BLUE, of w = X£ if Gy is unbiased, i.e., GX = X, and 
it has the smallest covariance matrix in the L6wner sense among all unbiased linear 
estimators of X8. For the proof of the following fundamental lemma, see, e.g., Rao 
[15, p. 282]. 


Lemma 11.1 Consider the general linear model My = {y, XB, V}. Then the statis- 
tic Gy is the BLUE for «x = XB if and only if G satisfies the equation 


G(X : VX") = (X:0). (11.2) 
One well-known solution for G in (11.2) (which is always solvable) is 
Gi. = X(X'W _X)-X’Wt,  whereW = V+ XX’. (11.3) 


It is worth noting, see, e.g., Puntanen et al. [11, Proposition 12.1 and 15.2], that Gj», 


which is invariant for the choices of the generalized inverses denoted by “~”’, can be 
expressed as 
Gi. = X(X/'W_X)_X’Wt = Pw — VM(MVM) MPw 
= Pw — VM(MVM)* = Pw — VM(MVM)'M.. (11.4) 


The general solution for G can be expressed, for example, as 


Go = Gi2 + NQ«x.y), where N€ R”*"is free to vary, 
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and Qyx.v) = In — Prx.v). However, the equality 
Goy = (Giz + NQ«x.yy)y_ forall ye @(X: V) 


holds when the model is consistent in the sense that y € @(X : V) with probability 
1. Notice that 


6(X:V) =@(X: VM) = ©(X) 6 (VM) = @(X) HE (MV), 


where “@” refers to the direct sum and “WH” refers to the direct sum of orthogonal 
subspaces. The solution for G in (11.2) is unique if and only if @(X : V) = R". 
The following lemma will be repeatedly used in our considerations. 


Lemma 11.2 Consider X = (X, : Xz) and let M, =I, — Py,. Then 


| 6(X, Xr) = G(X; 1 M)X), i.e, G(X) = G(X) BEM), 
: M=lI, — Py =f, = (Px, + Pm,x,) = M\Qy,x, = Qu,x,M1, 

- Oxy =M— Puv = MZ — Puv) = MQyy = QuvM, and 
1(M,X>) = r(X>) — dim ¢(X1) N G(X). 


WH NN 


Consider now the following linear models: 


M (Vo) = fy, X1B,, Vo}. -Gi2(Vo) = fy, XB, Vo}, 
M(V) = ty, Xi1B;,V}, M2(V) = {y, XB, V}. 


For the property that every representation of the BLUE of jw under @%2(Vo) remains 
BLUE of mw under .42(V), we can use the short notation 


{BLUE(u | 42(Vo))} S {BLUE(H | -%42(V))}. (11.5) 


The notation {BLUE(u | & )} refers to the set of all representations of the BLUE of 
fe under the model ., which can be symbolically denoted as 


{BLUE(m | “%2(V))} = {Gy : G(X: VM) = (KX: 0)}. 
By Rao [14, Theorem 5.2], the statement (11.5) is equivalent to 
€ (VM) € @(VoM). (11.6) 
By Rao’s Theorem 5.3, (11.6) is equivalent to the fact that V can be expressed as 
V = XAA‘X’ + VoMBB'MYVp (11.7) 
for some A and B. According to Rao’s Corollary 5.3.1, a further equivalent form is 


V=V)+ XCCX’ + VoMDD'MVy , (11.8) 
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for some C and D. As a matter of fact, Rao uses symmetric matrices I and A instead 
of AA’ and BB’, etc. But it is slightly simpler to use nonnegative definite matrices 
AA’ and BB’, etc. For further references regarding (11.6)—(11.8), see also Rao [13, 
Lemma 5], Rao [15, p. 289], and Mitra and Moore [10, Theorem 4.1-4.2]. 

As noted by Mitra and Moore [10, p. 148] and Rao [15, p. 289], Rao in Theorem 
5.3 of his 1971 paper had an unnecessary condition @(X : VoM) = R"” which was 
not used in the proofs. Notice that @(X : Vo) = @(X : VoM) and thereby @(X : 


VoM) = R” means that the BLUE of yu has a unique representation. 
Let us define the set “2 of nonnegative definite matrices as follows: 


VeV. = V=Vo+ XAA‘X’ + VoMBB’MV,, (11.9) 

for some A and B. The corresponding set for the small model is defined as 
VeW = V=aVo+X, CCX) + VoMDD'M, Vp, 
for some C and D. In view of @(X,) = @ (X; : M,X2) = @(X), it is clear that 
BLUE(x | 42) = BLUE(m | 42,). 

Thereby %2 = %2; , and we can equally well replace (11.9) with 

Ve%. = V=V0+X,AA'X' + VoMBB MY). 

In Sects. 11.2 and 11.3, we are interested in the relations between the sets % and 


V{2, 1.e., we want to find necessary and sufficient conditions for 4% C %»2 and for 


V2 oN. 


11.2 The Case when Vj is the Identity Matrix 


Putting Vo = I, the identity matrix of order n x n means that we have the models 
M21) = {y, XB, I} and .%2(V) = {y, XB, V} and 


{BLUE(« | @2())} = {By : BCX : M) = (X: 0)} = {Pxy}, 


i.e., the unique representation of the BLUE of 4 = XB under .42(D) is Pxy, the 
ordinary least squares estimator, OLSE, of w. Thus, the inclusion 


{BLUE(m | -42(D))} © {BLUE(u | M2(V))} 
can be interpreted as the equality 


OLSE(“) = BLUE(m) under.%2(V) . 
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Using X, = (X; : M;X2) in the characterization we consider the sets 


VeVi.) = V=1+X,AA’X’ + MBBM, 
VewD — V=I+X,CCX,+M,DDM,. 


Notice that if Vo = I, then we use notations % (I) and %2(I). In the general case, 
we use notations % and %2. Take now an arbitrary 


V =1+ X,AA'X! + MBB'M € %) (1). 


Ay »_ (AiAy ALAS a ) 
A= ry AA’ = / ! _ ey 
(2!) ian ADA, Asy Ave 


we have, for conformable partitioning of A, 


Denoting 


X,AA’X’ = XA X} + M) X2A0X5My + Xj AyX5My + M, X2Ao1 X) F 
Now V =1+X,AA’X’ + MBB'M € % (I) if and only if 
OLSE() = BLUE() under.@%(V), 


which holds, see, e.g., Rao [12] and Zyskind [16], if and only if any of the following 
equivalent statements hold: 


P,\VM,=0, @(VMi) © @(M,), PiV=VP,. (11.10) 


Applying (11.10) yields 
X,;A,.X5M; = 0. 


Thus, in view of M = MQmx, , the matrix V can be written as 


V =1+X,A1X; + M.X2AnX5M, + MBB'M 
1+ X;Ai:X) + Mi (X2Aq2X5 + Qui,x, BB’Qu,x,)Mi 
= I+ X;An Xj +M,SS'’M, ‘ (11.11) 


What about the reverse relation 4% (IT) C %2(D ? Take an arbitrary 
V=14+ X,CC’X + M,DDM; € “%(D. (11.12) 
Now V € %2(D if and only if Px VM = 0, which holds if and only if 


PxM|DD'M|M=0, ie., Pw,x,DD'M=0. (11.13) 
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Requesting that V has the representation of the type (11.11) and, at the same 
time, V has the representation (11.12) with (11.13) holding and observing that 


PxM,SS'M\M = 0, we can conclude that %2(1) = % (1 if and only if V is of 
the form (11.11). 


Theorem 11.1 The following statements hold. 


1. Vi2D CKD <> the matrix V € V2 can be expressed as 
V=I4+ XA X, +MX2A2X5M, + MBB'M (11.14) 


for some matrices A,, Az and B; Aj; = AjA’, ,i,j7 =1,2. 
2, WD CV%2D => the matrix V € Y can be expressed as 


V=1+X,CCX, + M,\DD'M, 
for some matrices C and D, where 
PyM|DD'M|M=0, i.e, Py,x,DD'M=0. 


3. VEYN Mp2 if and only if V can be expressed as in (11.14), in other words, 
I20D=10 — MoD oN. 


Remark 11.1 Let us have a look at the situation when we use X instead of X, in 
characterizing %2. In this case 


XAA’X’ = X) Ay, X) + XoAnX5 + Xp AX) + XA X}. (11.15) 
Now the inclusion %2(I) C %(D holds if and only if 
V =I+ XAA’X’ + MBBMe %(D, 
ie., P) VM, = 0, ie., 
P| XAA’X’My = P) X2A2X5M) + XiAX5My = 0. (11.16) 


If @(X1) N @(X2) = {0} then (11.16) simplifies to Xj Ay2X5 = —P)X2AoX;, in 
which case 


XAA’X’ = XA X} + XA X45 = Py X2Ax2X5 = XA X5P) 
= X)AyX) + M,X2AnX!, — XpAvoX5P, . 


Thus if @(X)) N @(X2) = {0}, then %2(D C %(D if and only if V € %2(1) can be 
expressed as 


V =14+ X,A,,X), + M)X)AnX) — XA XP; + MBB'M. 
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The use of (11.15) seems to yield rather complicated characterizations. Let’s consider 
an alternative formulation. Namely, it is clear, see, e.g., Puntanen et al. [11, p. 219], 
that V € %.(L) if and only if Vz can be expressed as 

Va =1+4+ PxAA’Px + MBB’M 


for some A and B. Now Vy € % (I) if and only if P) VM; = 0, which holds if and 
only if P| AA’Pyy,x, = 0, yielding the expression 


Vai =I-+ P,AA'P, + Py,x,AA’Pm,x, + MBB’M ; 


which is interestingly related to (11.14). oO 


11.3. The General Case of Vo 


Consider next the case of arbitrary nonnegative definite Vo and the two sets: 


Ve%"™r — V=Vo+X,AA’X, + VoMBB'MVo, (11.17a) 
VeXK = V=Vo+ XiCCX) + VoM,|DD'M, Vo. (11.17b) 


Take an arbitrary V, € %2 and ask when does the following hold: 
V. = Vo + X,AA’X, + VoMBB’MV) € % ? (11.18) 
In other words, if we have an arbitrary V, of the form (11.17a), what is needed that 
V.. can be expressed in the form (11.17b) for some C and D? Shortly denoted, we 
want to find a necessary and sufficient condition for the inclusion 42 C % . By Rao 
[14, Theorem 5.2], the condition (11.18) is equivalent to 
6(VeM)) C @(VoM)). (11.19) 
In view of M = M,Qy,,x, » we have 
6 (VoMBB’MVoM,) € @(VoM)) 
and hence the inclusion (11.19) holds if and only if 
6 (X,AA’X/M)) C @(VoM)). 


Notice that 
X,AA‘X’M, = X;Ai2X5M, + M,X2AX5M_ . 
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Consider a special situation when 
© (Mj X2Ao0X5M)) = © (MX2A2) C @(VoM)). (11.20) 
Requesting (11.20) to hold for any Ag, it becomes 
6 (M1 X2) C @(VoM)). (11.21) 


Then, in light of @(X;) N@(VoM,) = {0}, and assuming that (11.21) holds, the 
inclusion 


@ (X,AA'X/M)) = @(XAi2X5My + M1 X2Ao2X5M)) C @(VoMi) 


holds if and only if 
XjA2X5M; = 0, 


which further is equivalent to 
X,AA’X’) = X,A1,X) + M)X2AX5M, . 


Thus under (11.21) the inclusion %2 C % holds if and only if V €¢ %2 can be 
expressed as 


V = Vo + Xi An X1 + My X2A02X4M, + VoMBB’MVo. 


So we have proved the following. 


Theorem 11.2 Let us denote 
X, = (X;:MX2), A= (Aj: A})’, Aij =AiA’, i, j = 1,2. 
1. Inclusion V2 © %, holds if and only if the matrix V € V2 can be expressed as 
V=V)+ X,AA'X, + VoMBB'MY,) , 
where 
6 (X,AA'X,M1) © ©(VoM1), 
6 (X1A12X5M, + M\X2A2X5M) C ©(VoM)) , 
for some A and B. 


2. Suppose that 
@(M|X2) C ©(VoM)). 
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Then V2 © V, holds if and only if the matrix V € V2 can be expressed as 
V=Vo + XA X, + M1 X7A2X5M, + VoMBB'MV). 
What about the reverse relation “4% C %2 ? Take an arbitrary 
Vy~ = Vo + XiCC’X) + VoM|DD'M; Vp € %. 
By Rao [14, Theorem 5.2], the inclusion 4% C %2 is equivalent to 
€ (VM) © @(VoM). (11.22) 
Now we have 
V;M = VoM + VoM|DD'M VoM 
and thus (11.22) holds if and only if 
6 (VoM, DD'M, VoM) © @(VoM). (11.23) 
Consider the equality 4% = %2, i.e., we should characterize the set of matrices V 
belonging to 4% M %p2. In other words, we should combine (11.23) and Theorem 
11.2. Suppose that @(M,X2) C @(VoM)), so that 
M,X2A> = VoM,T for someT. (11.24) 


Now our request is that the following two expressions are both holding: 


1. V= Vo + Xi Ai: X1 + Mj X2An0X5M) + VoMBB’MV5, and 
2. V= Vo + X, CCX) + VoM;DD'M, Vp subject to (11.23). 


In light of (11.24), the matrix V in | can be expressed as 


V=Vo~t XA, X) + M, X2A2X5My + VoMBB'’MV, 
= Vo + XjAi1.X1 + VoMi(TT’ + Qyy,x, BB’'Qyui,x,)Mi Vo 
=: Vo + XiAn. Xi + VoM|;RR™MiVo, 


where R = T + Qy,x,B. Now 


VoMiRR'M; VoM = (M;X2A22X5M, + VoMBB’MVo)M 
= VoMBB’MV(M , 


where we have used M,M = M, and hence 


© (VoM, RRM, VoM) © @(VoM). 


206 S. J. Haslett et al. 


Thus, we can conclude that the matrix V in | is also of the form V in 2, and we have 
proved the following. 


Theorem 11.3 The following statements hold. 
1. The inclusion ¥, © Vz holds if and only if the matrix V € V, can be expressed 
as 
V=V)+X,\CC'X, + VoM\DD'M, Vo , 
for some matrices C and D where 


€(VoM,DD'M,VoM) © @(VoM) . (11.25) 


2. Suppose that @(M,X 2) © @(VoM,) . Then V2 = % holds, i.e, VE VAV%2, 
if and only if the matrix V can be expressed as 


V=Vo+ XAiX + M1 X2A2X5M, + VoMBB'MV). 


Remark 11.2 Putting Vo = Tin part 2 of Theorem 11.2 yields claim | of Theorem 
11.1. Similarly, putting Vo = I, condition (11.25) becomes 


@(M,DD'M|M) © @(M), iec., PxM,;DD'M|M=0, 
which is 2 of Theorem 11.1. Thus, Theorem 11.1 follows from Theorems 11.2 and 
11.3. However, we feel that it is instructive to go separately through the proof of 


Theorem 11.1; for example, it is interesting to notice the role of the conditions for 
the equality of OLSE and BLUE. 


11.4 A Related Sidetrack 


For the property that every representation of the BLUE of (estimable) w, under 
%\2(Vo) remains BLUE of mw, under .42(V), we can use the notation 


{BLUE(W, | -“2(Vo))} S {BLUE(H; |-M2(V))}. (11.26) 


In view of Haslett and Puntanen [4, Theorem 2.1], (11.26) holds if and only if (see 
also Mathew and Bhimasankaram [9, Theorem 2.4]) 


@(M2VM) € @(M2VoM), ie, @C(VWM) C @(X2 : VoM). 


Defining Wo = Vo + XX’, we have @(X : VoM: Qw,) = R”. Hence an arbitrary 
nonnegative definite V can be expressed as V = FF’, where 


F = X\L, + Xoo + VoMLs + Qy,La, 
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for some matrices L,..., L4 so that 

V = FF = XL), X) + Xba X) + VoML33MVo + Qw, LuQw, +Z+ Z’, 
where Lj; = L{L’,, i, j = 1,...,4, and 


Z = X\L12X5 + Xi) Li3MVo + Xi LiyQy, 
+ XoL23MVo + XoLaQwy, + VoML34Qy, - 


Moreover, Mo F = M2X\L; + M2VoML3 + MoQw,La4, and 
F’'M = L3MVoM + Li Qw,M = L3MVoM + L{Qw, =: K, 
where we have used Qw,M = Qy,; this is due to Qw, = Quy,M. Now 
M2 FF'M = M2X,L;K + M2 VoML3K + M2Qw,LiK, 
and thereby @ (M>2FF’M) © @(M>VoM) holds if and only if 
@ (M2XL1K + MoQw,LaK) C @(M2VoM) = @(M2WoM), 
which further is equivalent to 
@ (XL, K + Qw,LaK) © @(X2 : WoM). (11.27) 
Premultiplying (11.27) by Qw, = MQmry, yields Qw,L4aK = 0, ie., 
Qw,LaL3MVoM + Qw,LsL,Qw, = 9. 
which further holds if and only if Qw, LaL,Qw, = 90, i-e., L,Qw, = 0. By (11.27) 
we have 
X,L,K = X,U; + WoMU> for someU,andU>2 . (11.28) 
It is easy to conclude that (11.28) holds if and only if X;L,;K = 0, ie., 
X,L;L3MVoM + Xi LiL, Qw, = 9, 
which holds if and only if X;L;L3MVo = 0, which together with L,Qw, = 0 yields 
Z = Xi Li L5X5 + XoL2L{MVo. (11.29) 


Thus we have proved the following. 


Theorem 11.4 The inclusion 
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{BLUE(H; | @42(Vo))} S {BLUE(H, | Z2(V))} 
holds if and only if the matrix V can be expressed as 


V= Xi LL) X; + X2LnL,X, + VoML3L5MVo +Z4+Z 
= XLL'X’ + VoML3L5MVo + X2L2.L5MVo + VoML3L),X5 (11.30) 


for some L' = (L\, : L5), Ls, and Z is defined in (11.29). 


Notice the striking similarity between (1 1.7) and (11.30), the only difference concern- 
ing the term X2L2L5MVo + VoML3L}X;. The characterization of V in Theorem 
11.4 can be also expressed as 


V = XIX’ + VoMAMYV) + Xo =MVp + VoMz’X) 


for some matrices I’, A, and & such that V is nonnegative definite. 

Isotalo and Puntanen [6, Theorem 1] characterized the set of matrices V such that 
the ordinary least squares estimator of estimable KB equals BLUE. Corresponding 
characterizations for V retaining the equality of the BLUEs of Kf in the spirit of 
Theorem 11.4 can be done. 


11.5 OLSE Equals BLUE Under the Small Model, What 
About the Big Model? 


One can assume that BLUE(#,) = OLSE(,) under the small model and then check 
when the equality continues when more X-variables are added; here OLSE refers to 
the ordinary least squares estimator which is 


e under .%: ft)(M) = Px,y = Piy = XiXTy, 
e under Mp: ft;( M2) = Pigy = Xi (KX MoX1)7 X{ Mby. 


Our aim is to give several equivalent characterizations for the continuation of the 
equality of OLSE and BLUE when more X-variables are added. Some related con- 
siderations were made by Isotalo et al. [7]. Haslett and Puntanen [4, Theorem 3.1] 
considered the effect of adding regressors on the equality of the BLUEs under two 
linear models with different covariance matrices. 

Above, and throughout this section, we assume that «, = X18, is estimable under 
M2, i.e., €(X1) 1G (Kz) = {0} and thereby 


Px = Pyx,:x,) = Pig + Pox, 


where Poy = X2(X5M)X2)~X5Mj. Recall that by (11.3) one expression for the 
BLUE of pm under .42 is Gi2y where 
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Gi. = X(X/W-X)"X'W*,  whereW = V+ XX’, 

while the BLUE of w, under .%, can be expressed as G,y where 

G, = X, (XW, Xi)" X, Wy, whereW; = V + XiX}. 
Let us denote 

Giy = Xi (X{MbX1) XM, Goy = Xo(KX5M1X2) X5M), 

where M, and Mp are (unique) matrices defined as 

M, = Mi(M,WM,)*M,, Mo = M>(M2WMb)*Mb. 
Now under the estimability of #7, = X,B,, we have 

BLUE(H, | 42) = Givy, BLUE(H2 | 42) = Gry, 
and, see Haslett et al. [5, Proposition 3.1], 

Gio = Gig + Gry . 


Following Haslett et al. [5, Proposition 4.1], it is straightforward to confirm that 
G, Gy, = G)y and G;G,2 = G, and thereby 


G, — Gy = Gi (Giz — Giy) = Gi Gry 
and so we have 


Gi4 = G, — G; (Giz — Gi#) = G; — G; Gr , (11.31a) 
Pig = P; — Pi Pox::x,) — Pig) = Pi — Pi Poe. (11.31b) 


Assume that under .“, we have (with probability 1) @,(4,) = (4%), te., Pry = 
Gyy for all y € @(W)), so that 


P,(X1 : VM;) = Gi(X : VM;) = (X; : 0), (11.32) 


which obviously holds if and only if P;} VM; = 0. Notice that (11.32) means that 
P, = G,. Suppose that P; = G,. Then (11.31a) and (11.31b) become 


Gig =P, —PiGox, Pig = Pi — Pi Po, 


and in this situation 
Giz — Pig = P) Gay — Pi Poe, 
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and thereby the equality Pizy = Gisy for all y € @(W) can be written as 
P, PogW = P| GoW. (11.33) 


Denote y = X;a+ Xob + VMce = (X; : Xo: VM)t for some a, b, and c so that t = 
(a’, b’, c’)’. Then 


P| Poxy — Py Goxy = P) (Pay — Gry) (KX) : Xp: VM)t 
=> (0 i, | P,; Pox VM)t => P, P24 VMc : 


Thus (11.33) is equivalent to P,P.3VM = 0, i.e., XPo#VM = 0. 
Actually, (11.33) means the equality 


P| Poy = P| Gu 7 1.€., X) Pox = X' Gay . 


This can be concluded from the fact that P} P2sQy = P;GoxQw = 0. Thus we have 
proved the following. 


Theorem 11.5 Consider the models M@\2 and Md, and suppose that w, = XB, is 
estimable under M7, and OLSE(m, | 4) = BLUE(n, | ™) . Then the following 
statements are equivalent: 


| OLSE(, | M2) = BLUE(, | M2), ie, PieW = Gis W, 
_ X\PxW = X\ GoW, 

EY, Pay = Gon, 

. XPoxVM = 0, and 

. X,OLSE(, | M2) = X,BLUE(p, | M2). 


MBRwWNS 


Remark 11.3 Consider the following task: Define the subspace A C R” so that 
OLSE(“) = BLUE(«) under .42 for all y € A, ie., 


Pxyy = Gpy forallye A, 
where Px = P,x,:x,) and by (11.4), 
Gi. = X(X’W_X’)-X’Wt = Pw — VM(MVM)"M, (11.34) 
where W = V + XX’. Premultiplying (11.34) by Px gives 
Gy. = X(X'W _X’) X'Wt = Px — Px VM(MVM)'M.. 
Thus we have to solve A from the condition 


Px VM(MVM)'My = 0. for ally € A, 
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A= WV (PxZ), where Z = VM(MVM)'M, 
where .¥ (-) refers to the nullspace of the matrix argument. Kramer [8] considered 


the corresponding problem assuming that V is positive definite and X has full column 
rank. He showed that in that case 


A= @(X) H[@(M) 1 @(VM)]. (11.35) 


Let us relax from the rank assumptions on X and V. We utilize the result of Baksalary 
and Trenkler [1, Theorem 1] stating that for idempotent matrices R and S (of the 
same size), the following holds: 


WN (RS) = V(S) @LV(R) NGS). (11.36) 
Now Px and Z are idempotent and @(Z) = @(VM), and thus (11.36) gives 
A= N(Z) [6 (M)N @(Z)] = V(Z) @[@(M) 1 (VM). 
Clearly, @(X) C ./ (Z), and thereby @(X) = WV (Z) if and only if 
r(X) =n—1(Z) =n—r1(VM), ie., r(X:VM)=r(X:V)=n. (11.37) 


Thus Kramer’s result (11.35) applies if and only if (11.37) holds. Further expressions 
for A appear in Baksalary et al. [2]. 


11.6 Conclusions 


In this article, we consider the partitioned linear model -412(Vo) = {y, Xi 8B, + 
X2B>, Vo} and the corresponding small model 4 (Vo) = {y, X1B,, Vo}. Follow- 
ing Rao [14, Sect. 5], we can characterize the sets % and % of nonnegative definite 
matrices so that 


Ve => {BLUE(M, | 4 (Vo))} S {BLUE(H, | M(V))}, 
Ve. => {BLUE(H | M2(Vo))} S {BLUE(h | -@2(V))}, 


where jt, = XiB,, « = XB, and .M2(V) = {y, XB, + X2B2, V} and .M(V) = 
{y, X18), V}. The notation {BLUE(u | <7} C {BLUE(« | 4} means that every rep- 
resentation of the BLUE for mw under .<” remains BLUE for mw under &. In the first 
three sections of this article, we focus on characterizing the mutual relations between 
the sets % and %>. 
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In Sect. 11.5, we assume that under the small model .% the ordinary least squares 
estimator, OLSE, of f, equals the BLUE and give several characterizations for the 
continuation of the equality of OLSE and BLUE when more X-variables are added. 

Our main questions are somewhat unusual and, we admit, are slightly on the 
nonpractical side. However, they offer us interesting linear-algebraic problems, and, 
moreover, sometimes new theoretical considerations may provide surprises for appli- 
cations. Moreover, as the Referee pointed out: “Models with added covariates those 
of primary interest occur quite frequently in applications”. 
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Chapter 12 ®) 
On Some Special Matrices and Their ere 
Applications in Linear Complementarity 
Problem 
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12.1 Introduction 


The linear complementarity problem (LCP) is a combination of linear and nonlinear 
equations and inequalities. LCP is crucial in mathematical programming because it 
gives a unified approach to studying linear programming, linear fractional program- 
ming, convex quadratic programming and bimatrix games. It is always a demanding 
and interesting area for optimization researchers. The linear complementarity prob- 
lem finds a vector in a finite-dimensional real vector space that satisfies a specific 
system of inequalities. The problem is described below. 
Given a matrix A € R”*” and a vector q € R”, LCP is to find a vector z € R” 
such that 
q+Az>0, z= 0, (12.1) 


z’ (q+ Az) =0, (12.2) 
or to demonstrate that no such vector z exists. This problem is denoted by LCP(q, A). 


Let 
w=qt+Az. (12.3) 
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Then LCP (12.1) and (12.2) can be formulated as follows: 

w=q+Az, w>0,z>0, (12.4) 

w'z=0. (12.5) 


Condition (12.5) is frequently substituted in place of (12.2). When talking about 
algorithms, this method of LCP representation is useful. 

For a matrix A € R”*” and a vector g € R”, the feasible set and the solution 
set of LCP(q, A) can be defined by F(g, A) = {z € R" | z>0, Az+q => 0} and 
S(q, A) = {z € F(q, A) | z"(Az +g) = 0}, respectively. If w + z > 0, a solution 
(w, z) of the LCP(q, A) is said to be nondegenerate. If every solution (w, z) of the 
LCP(q, A) is nondegenerate, then a vector g € R’” is said to be nondegenerate with 
respect to matrix A. 

In the literature on mathematical programming, LCP is well studied, and it has 
several applications in operations research, mathematical economics, geometry and 
engineering which have been described by Cottle et al. [5] and Murty and Yu [21]. In 
the field of stochastic games, the linear complementarity problem has recently found 
several applications (see Mohan et al. [16]). The research of the LCP, particularly 
the study of the LCP algorithms and the study of matrix classes, has been inspired 
by these kinds of applications and the possibility of future applications. In fact, the 
assumption that matrix A belongs to a particular class of matrices is the foundation 
of most of the linear complementarity theory and algorithms. This chapter discusses 
recent classes of matrices studied in the last two decades and summarizes some main 
results pertaining to these classes. 

The chapter is organized as follows. Section 12.2 presents some notations, defini- 
tions of various matrix classes, the concept of principal pivot transform (PPT) and the 
basics of the zero-sum matrix game. Section 12.3 introduces a hidden Z-matrix with 
some examples and properties. Further, it shows the game theory-based characteri- 
zation of hidden Z-matrices. Section 12.4 is devoted to studying the class of positive 
subdefinite matrices (PSBD) and indicates that LCP with PSBD matrices of rank 
> 2 is processable by Lemke’s algorithm. Section 12.5 discusses the generalized 
positive subdefinite matrices (GPSBD) and their properties in the context of LCP. 
Section 12.6 defines the weak generalized positive subdefinite matrices (WGPSBD). 
Section 12.7 presents GPSBD matrices of level k and identifies that this class is a 
subclass of row sufficient matrices. The aim of Sect. 12.8 is to introduce the class 
of fully copositive matrices and their properties. Section 12.9 presents the singular 
No-matrices with Q-property. In Sect. 12.10, we study the almost type class of matri- 
ces with Q-property and obtain a sufficient condition for almost N-matrices to hold 
Q-property. Further, we present the generalization of almost type class of matrices 
with examples. Section 12.11 introduces various classes of matrices that are defined 
based on principal pivot transforms. Section 12.12 presents a list of open problems. 
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12.2 Preliminaries 


We consider matrices and vectors with real entries. Any vector x € R” is a column 
vector unless otherwise specified, and x7 denotes the transpose of x. For any matrix 
A € R”*", aj; denotes its ith row and jth column entry. For any matrix A € R”*", 
Aj. denotes the ith row of A and A.; denotes the jth column of A. For any set @, |a| 
denotes its cardinality. For any index seta C {1,2,...,}, @ denotes its complement 
in {1,2,...,n}. If Aisa matrix of ordern, a C {1,2,...,n}and C {1,2,...,n}, 
then Ayg represents the submatrix of A consisting of only those rows and columns 
whose indices are in a and #, respectively. If a = 6, then the submatrix Aga is 
referred to as a principal submatrix of A and det Ag, is referred to as a principal 
minor of A. We denote the identity matrix by J and the empty set by J. A vector 
x € R" is unisigned, if either x; > 0 or x; < Oforalll <i <n. 

We now define some matrix classes that will be needed in the subsequent sec- 
tions, such as positive definite (PD), positive semidefinite (PSD), positive subdef- 
inite (PSBD), merely positive subdefinite (MPSBD), semimonotone (Eo), fully 
semimonotone (ES ), copositive (Co), strictly copositive (C), copositive-plus (Ce. ), 
copositive-star (Cj ) and fully copositive (ct ). 

A matrix A € R”%*” is said to be 


Q: if Vq € R", LCP(q, A) has a solution. 

Qo: if F(q, A) FB => S(qg, A) FY. 

PD: if V0 4x € R", x7 Ax > 0. 

PSD: if Vx € R", x7 Ax > 0. 

PSBD: if Vx € R", x7 Ax <0 implies either A’ x < 0 or A’ x > 0. 
MPSBD: if A is a PSBD matrix but not a PSD matrix. 

Eo: if V0 A x = O there exists i such that x; > 0 and (Ax); => 0. 
Ej : if every PPT of A is in Ep. 

Co: if Vx > 05> x7 Ax > 0. 

C:ifVOAx>O0Sx Ax > 0. 

Ch: if A € Co pas =0, x > 0] > [(A+ A’) x =0]. 
C$: if A € Co and [x7 Ax =0, Ax >0, x > 0] > A’x <0. 
Cé: if every PPT of A is in Cp. 

Z:ifa; <OVi, j € {1,2,...,n} such thati ¥ j. 

N: if all its principal minors are negative. 

No: if all its principal minors are nonpositive. 

P: if all its principal minors are positive. 

Po: if all its principal minors are nonnegative. 

K: if itis a Z-matrix as well as P-matrix. 

S: if there exists a vector x > 0 such that Ax > 0. 

R: if Vt => 0, LCP(te, A) has only the trivial solution. 

Ro: if LCP(0, A) has only the trivial solution. 

Column sufficient: if Vx € R", x;(Ax); <OVi=> x,(Ax); =O0Vi. 


218 S. K. Neogy and G. Singh 


e Row sufficient: if A? is column sufficient. 
e Sufficient: if A and A’ are both column sufficient. 


Note that P C Po C ae C Eo and Ci Cc EF . A matrix class is said to be complete 
if all principal submatrices of its elements belong to the same class. We claim that 
an LCP can be processed by Lemke’s algorithm if Lemke’s algorithm can either 
compute a solution to it, if one exists, or demonstrate that there is no solution. (See 
Cottle et al. [5] for more details on Lemke’s algorithm.) 

The concept of principal pivot transforms (PPTs) was introduced by Tucker [31]. 
The PPT of a matrix A with respect to a C {1, ..., m} is defined as the matrix given 
by 


where A= (Aga ’ Ala =z (Aaa)! Aaa: Ane = Aa (Aga) ’ Aba = Asa— 
Aga (Awa)! Aga. The expression Aga — Aaa (Aww)! Aga is the Schur complement 
of Aga in A and is denoted as (A/Agq). The PPT of LCP (g, A) with respect to a 
(obtained by pivoting on Ag) is given by LCP (q’, A’) where gq’, = —Azigq and 
qs = 4a — AawAze da. We use the notation £2, (A) (= A’ ) for PPT of A with respect 
toa C {1,...,}. Note that PPT is only defined with respect to those w for which 
det Agy # 0. 

Tucker demonstrated that A is a P-matrix if the diagonal entries for each PPT 
of A are positive. However, A need not be a Po-matrix if the diagonal entries for 
each PPT of A are nonnegative. The diagonal elements of each PPT of the matrix 


A= E | can be easily verified to be nonnegative, but A ¢ Pp. 


The following proposition observed by Dubey and Neogy [8] shows that the class 
of S-matrices is invariant under PPT. 


Proposition 12.1 Let 9,(A) € R’*”" be a PPT of a given matrix A € R"*" with 
respect toa C {1,...,n}. Then A € S ifand only if (y(A) € S. 


We present some fundamentals of a two-person zero-sum matrix game due to Von 
Neumann and Morgenstern [32], which will be required in the sequel. A two-player 
zero-sum matrix game is defined as follows. 

Player 1 selects an integer i(i = 1,..., m) and player 2 selects an integer j(j = 
1, ...,) simultaneously. Then player 1 pays player 2 an amount a;; (which can be 
positive, negative or zero). The game is considered to be zero-sum because player 2’s 
gain is player 1’s loss. A strategy for player 1 is a probability vector x € R”, the ith 
component of which represents the probability of selecting an integer i where x; > 0 
fori =1,...,m and )>"_, x; = 1. Similarly, a strategy for player 2 is a probability 
vector y € R". 

According to Von Neumann’s fundamental minimax theorem, there exist strate- 
gies x*, y* and a real number v such that 
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The strategies (x*, y*) with x* € R” and y* € R” are said to be optimal strategies 
for player | and player 2, respectively, and v is known as the minimax value of the 
game. The payoff matrix in the game described is A = (ai i) . The value of the 
game corresponding to the payoff matrix A is denoted by v(A). The minimizer is 
player 1, and the maximizer is player 2 in the game mentioned above. The value 
of the game vu(A) is positive (nonnegative) if there exists a 0 £ y > O such that 
Ay > O(Ay => 0). Similarly, v(A) is negative (nonpositive) if there existsa0 4 x > 0 
such that A7x <0 (ATx < 0). If x > 0, a mixed strategy for player | is said to be 
completely mixed, which means that all of the rows of the payoff matrix A are 
selected with a positive probability. In the same way, a mixed strategy y for player 
2 is defined. A matrix game with a payoff matrix A € R”*” is said to be completely 
mixed if all optimum mixed strategies x for player | and y for player 2 are completely 
mixed. 


12.3. The Class of Hidden Z-Matrices 


Mangasarian presented a generalization of the class of Z-matrices for studying the 
class of linear complementarity problems that can be solved by linear programs, 
Mangasarian [10-12]. Since many of the properties that hold to the class Z also 
hold to the class hidden Z, Pang proposed calling this generalization of the class Z 
the class of hidden Z-matrices. As per Mangasarian [10] and Pang [28], a hidden 
Z-matrix is as follows. 


Definition 12.1 A matrix A € R”*” is said to be a hidden Z-matrix if there exist 
Z-matrices X, ¥Y € R"*" andr, s € R% such that 


Gi) AX =Y, 
(ii) r7X +s7Y > 0. 


Hidden Z is the abbreviation for the class of hidden Z-matrices. A necessary and 
sufficient condition was developed by Pang [28] for a hidden Z-matrix to be a P- 
matrix. Chu [2] investigated the class of hidden Z M P-matrices, which is also known 
as the class of hidden K -matrices. 

We have the following example of a hidden Z-matrix. 
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Example 1 
Consider the matrix 
0-2 2 
A=] —4 12-10 
0-6 4 
with 
20 0O 0 -4 0 
X=}]02-2] and Y=] —8 24-4 
00 —2 0-12 4 
16 0 
Letr =| 0 | ands = | 3 |, thenr?X +57¥Y =[8 244]. 
0 4 


We see that X and Y are Z-matrices. Also, AX = Y andr? X +s7Y > 0. Thus A 
is a hidden Z-matrix. 


Now, we study some properties of hidden Z-matrices and discuss the existence 
of the solution of LCP(g, A) where A is a hidden Z-matrix. 

It should be noted that many of the results that apply to the Z class admit an 
extension to the hidden Z class because the class of hidden Z-matrices generalizes 
the class of Z-matrices. All proper principal submatrices need not be hidden Z- 
matrices, even if A is a hidden Z-matrix. Now, we have the following example. 


Example 2 
From Example 1, we can see that A is a hidden Z-matrix, but for w = {1, 3}, 


the principal submatrix Aggy = is not a hidden Z-matrix. To prove this, 
X11 X12 


suppose on the contrary that there exists a Z-matrix X= ; - 
21 X22 


such that 


2x1 2x22 


aS AaaX = ae 4x0 


€ Z. Since X € Z, we have X12 <0 and x2; < 0. Fur- 


ther, YeZ implies x22 < 0. As both entries in the second column of X and Y are 
less than or equal to zero, therefore for any r,s € R?, the second component of the 
row vector r? X + s7Y will always be less than or equal to zero. Hence Aga is not a 
hidden Z-matrix. 
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Furthermore, it is commonly known that the inverse of a K -matrix is nonnegative. 
However, unlike the case of a K -matrix, the inverse of a hidden K -matrix is not always 
nonnegative. The same is demonstrated in the example below. 


Example 3 (Dubey and Neogy [8]) 


Consider the matrix 


2-3-4 
A=|]-1 2 3 
2—2 5 
We can easily see that A is a hidden K-matrix. But 


2.2857 3.2857 —0.1429 
A l= 1.5714 2.5714 —0.2857 
—0.2857 —0.2857 0.1429 


is not a nonnegative matrix. Indeed, no PPT of A is a nonnegative matrix. 


Now, we consider a class of hidden Z-matrices from the LCP’s point of view. To 
solve LCPs as linear programs, Mangasarian introduced the hidden Z class, and he 
demonstrated the following result. 


Theorem 12.1 (Mangasarian [10]) Let A € hidden Z and F(q, A) 4 ¥. Then the 
LCP(q, A) has a solution which can be obtained by solving the linear program 
LP(p, q, A): 


Minimize px 
subject tog + Ax = 0, 
x >0, 


where p =r + A's,r ands are as in Definition 12.1. 


We state the following lemmas proved by Dubey and Neogy [8], which are useful 
to prove subsequent theorems. 


Lemma 12.1 (Dubey and Neogy [8]) Let A € R’*”" be a hidden Z-matrix. Let 
{q(A) be a PPT of A with respect to a C {1,...,n}. Then §9y(A) is a hidden Z- 
matrix. 


By constructing an equivalent LCP, we demonstrate that an LCP(g, A) involving 
a hidden Z-matrix can be solved whenever it is feasible. We construct this LCP in 
the subsequent lemma. 
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| 


Lemma 12.2 (Dubey and Neogy [8]) Consider LCP(qg, A) with A = B 0 


q= 1? | , A € R"™" is a hidden Z-matrix, p, q € R", where p =r + A's, andr 


and s are as in Definition 12.1. a € F(q, A) then (I - A’) ytp>o. 
The relationship between the solution of LCP(qg, A) and LCP(q, A) is examined 
in the following lemma. 


Lemma 12.3 (Dubey and Neogy [8]) Consider the LCP(g, A) in Lemma 12.2 and 
LCP(q, A). Then LCP(q, A) has a solution if and only if LCP(q, A) has a solution. 


Theorem 12.2 (Dubey and Neogy [8]) Suppose A € hidden Z. Then A € Qo. 


Proof It is noted that the feasibility of LCP(g, A) implies the feasibility of 
LCP(q, A). Since A € PSD, therefore A € Qo implies that LCP(g, A) has a solu- 
tion. By Lemma 12.3, LCP(q, A) has a solution if and only if LCP(g, A) has a 
solution. Thus, the feasibility of LCP(q, A) implies solvability of LCP(g, A). Hence 
Aé Qo. O 


Theorem 12.3 (Dubey and Neogy [8]) Let fo,(A) € R"*” be a PPT of a given 
matrix A € R"*" fora C{1,...,n}. Suppose fog(A) € hidden ZS. Then A € Q. 


Proof Assume that for a C {1,...,n}, foy(A) € hidden ZS. It follows from 
Proposition 12.1 and Lemma 12.1 that A € hidden Z 1 S. Further, by Theorem 
12.2, we get A € Qo NS. Since Q = Qo N S (see Cottle et al. [5, p. 146]), therefore 
AeQ. 


It should be noted that it is unknown whether or not Lemke’s algorithm can solve 
LCP(q, A) involving a hidden Z-matrix. However, in the following proposition, we 
see that the equivalent LCP(q, A) is processable by Lemke’s algorithm. 


Proposition 12.2 (Dubey and Neogy [8]) LCP(g, A) is processable by Lemke’s 
algorithm. 


Proof Note that A € PSD C Py MN Qo. As a result of Aganagic and Cottle’s [1] 
theorem, LCP(g, A) is processable by Lemke’s algorithm. 


Neogy and Das [23] proposed an open problem for the complete characterization 
of the class of matrices with at least one PPT that is a Z-matrix. Dubey and Neogy 
[8] observed that this class is a subclass of the hidden Z class. Let Z denote the class 
of matrices for which at least one PPT is a Z-matrix. 


Theorem 12.4 (Dubey and Neogy [8]) Z C hidden Z. 


Proof Let A € Zand fora = ap C {1,..., m}, {q,(A) € Z. This implies (9,,(A) € 
hidden Z. From Lemma 12.1, it follows that A € hidden Z. 


The matrix A in Example 3 demonstrates that Z does not capture the entire hidden 
Z class. It is easy to show that ,(A) ¢ Z for anya C {1,...,n}. 
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12.3.1 Various Game Theory-Based Characterizations of 
Hidden Z-Matrices 


In this section, we demonstrate how two-player matrix game results from Von Neu- 
mann and Morgenstern [32], and Raghavan [30] can be applied to generate interesting 
results in LCP and related matrix classes. 

The value of the matrix game corresponding to the payoff matrix A € R’*” is 
maintained in all of its PPTs, indicating that the value of the matrix game v(A) > 0 
if and only if v(994(A)) > 0, Neogy and Das [23]. The existence of solutions to the 
LCP(q, A) is fundamentally dependent on the study of the matrix classes Q and Qo. 
The value of the matrix game v(A) can be used to associate the matrix classes Q and 
Qo. Indeed, A € Q if and only if A € Qo for any matrix game with v(A) > 0. 

The following lemma proved by Raghavan [30] is useful to prove the subsequent 
theorem. 


Lemma 12.4 (Raghavan [30]) Let A € R”’*” be a payoff matrix whose off-diagonal 
entries are nonpositive. If v(A) > 0, then the game is completely mixed. 


Dubey and Neogy [8] have generalized the above lemma in the following theorem. 
Theorem 12.5 Let A € Z. If v(A) > 0 then the game is completely mixed. 


Proof Itis worthnoting that A € Z implies that o,(A) € Zforsomea C {1,..., n}. 
From Theorem 12.2, A € Qo. In addition, since v(A) > 0, A € Q. Since the PPT 
of a Q-matrix is a Q-matrix, it follows that (A) € Q. As a result, v (s94(A)) > 0. 
From Lemma 12.4, the game with payoff matrix s9,(A) is completely mixed. We 
find that the game with payoff matrix A € Z is also completely mixed, since the 
payoff matrix A is a PPT of payoff matrix 6o,(A). 


The following is another result involving hidden Z-matrices presented by Dubey 
and Neogy [8]. 


Theorem 12.6 Let A be a hidden Z-matrix; there exists a positive diagonal matrix 
D such that (AD + DA’) is positive definite. Then A € Q. 


Proof The matrix A is hidden Z implies that A € Qo. To prove that A € Q, 
we need to show that v(A) > 0. Suppose to contrary that v(A) < 0. Then there 
exists a probability vector y € IR” such that y’ A < 0. Thus for a positive diagonal 
matrix D, y' AD < 0. Now y’ (AD + DA’) y = y’ADy + y’DA’y < 0. This 
contradicts the positive definiteness of (AD + DA’). Therefore v(A) > 0. Hence 
AeQ. 
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12.4 The Class of Positive Subdefinite Matrices 


In this section, we study positive subdefinite matrices (PSBD) introduced by Crouzeix 
et al. [7] and show that LCPs with PSBD matrices of rank > 2 are processable by 
Lemke’s algorithm and that a PSBD matrix of rank > 2 belongs to the class of 
sufficient matrices studied by Cottle et al. [4]. The class of PSBD matrices is a 
generalization of the class of PSD matrices and is helpful in the study of quadratic 
programming problems. 

Knowing whether a PSD matrix’s properties also apply to a PSBD matrix is 
important, since a PSBD matrix is a natural generalization of a PSD matrix. We may 
particularly examine whether 


(i) Ais PSBD if and only if (A + A”) is PSBD, 
(ii) any PPT of a PSBD matrix is a PSBD matrix. 


The following examples demonstrate that the above statements are not true. 


Example 4 


Consider a matrix 


Then for any vector x = & , x7 Ax = 2x1x2 < 0 implies x, and x2 are of oppo- 
2 


site sign. Clearly, A € PSBD because x7 Ax < O and A? x = Ea imply either 
1 


A’x <OorA’x >0. 
Also, we can easily see that 


a 
eat = [22] 


is not a PSBD matrix. 


Similarly, consider a matrix 


_ | 0-4 T 0 —2 
a=|> | sothat A+A =| 55 AR 


It can be easily verified that A + A’ is PSBD, but A is not a PSBD matrix. 
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Example 5 


Consider the matrix 
= 04 
~ | -20 
as in Example 4. It should be noted that A € PSBD, but it is easy to verify that 


4) 0: 04 
2 ate | 


is nota PSBD matrix. Since A~! is a PPT of A, therefore any PPT of a PSBD matrix 
is nota PSBD matrix. 


We now know whether PSBD C Qo and PSBD matrices are processable by 
Lemke’s algorithm. We need the following results to prove the subsequent theorems. 


Theorem 12.7 (Crouzeix et al. [7]) Let A = ab’, where a £ b; a, b ER". Ais 
PSBD if and only if one of the following conditions holds: 


(i) there exists at > 0 such that b = ta, 
(ii) forallt > 0, b £ ta and eitherb > O0orb <0. 


Furthermore, assume that A € MPSBD. Then A € Co if and only if either (a > 0 
and b > 0) or (a < Oand b < 0) and A € Cj if and only if A € Cy and b; =0 > 
qi = 0. 


Theorem 12.8 (Crouzeix et al. [7]) Suppose A € R"*” is PSBD and rank(A) > 2. 
Then A’ is PSBD and at least one of the following conditions holds: 


(i) A € PSD; 
(ii) (A+ AT) <0; 
(iii) Ae Ce. 


Lemma 12.5 (Mohan et al. [14]) Suppose A € R"*" is a PSBD matrix with 
rank(A) > 2andA+ A’ <0. If A is not a skew-symmetric matrix, then A < 0. 


Theorem 12.9 (Mohan et al. [14]) Suppose A € R"*" is a PSBD matrix with 
rank(A) > 2. Then A € Qo. 


Proof By Theorem 12.8, A’ € PSBD. Also, by the same theorem, either A ¢ PSD 
or (A+ A’) <0 or A € CG. If A € C5, then A € Qo (see Cottle et al. [5]). Now, 
if (A+ A’) < 0, and A is not skew-symmetric, then from Lemma 12.5, we have 
A < 0. Inthis case, A € Qo. However, if A is skew-symmetric, then A €¢ PSD. Hence 
A € Qo. 
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Corollary 12.1. (Mohan etal. [14]) Let A bea PSBD matrix with rank(A) > 2. Then 
Lemke’s algorithm can solve LCP(q, A). If rank(A) = 1, (that is A = ab’. a, 

b £0) and A is a copositive matrix, then LCP(q, A) can be processed by Lemke’s 
algorithm whenever b; = 0 => a; = 0. 


Proof Assume rank(A) > 2. By Theorem 12.8 and from the proof of Theorem 12.9, 
it follows that A is either a PSD matrix or A < 0 or A € Cj. Thus, LCP(q, A) can 
be processed by Lemke’s algorithm (see Cottle et al. [5]). For PSBD MC matrices of 
rank(A) = l,i.e., for A = ab’, a, b 4 0, such that b} = 0 > a; = 0. By Theorem 
12.7, we have A € Cj. Hence, LCP(qg, A) with such matrices can be processed by 
Lemke’s algorithm. 


Theorem 12.10 (Mohan et al. [14]) Let A € R"*” be a PSBDN Co matrix with 
rank(A) > 2. Then A is a sufficient matrix. 


The following example demonstrates that PSBD matrices need not be Po-matrix 
in general. 


Example 6 


Consider a matrix 


Then for any vector x = ib | 
2 


x’ Ax = —4x1x <0 


Sat) sd : : ca, 5 (ene 7 a 
implies that x; and x are of the same sign. Since A?x = a implies either 
7X] 


A’x <0OorA’x > 0, Ais PSBD matrix. But A is not a Py-matrix. 


The following example studied by Mohan et al. [14] demonstrates that in general 
PSBD matrices need not be Qo-matrix. 


Example 7 


Consider a matrix 


Then for any vector x = ise | 
2 
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fice ii | 


implies either A’x <0 or A?x >0. Therefore A is a PSBD matrix. Take g = 


7 | . Itis easy to see that LCP(q, A) is feasible but has no solution. Hence A ¢ Qo. 


12.5 The Class of Generalized Positive Subdefinite Matrices 


This section is dedicated to the study of the class of generalized positive subdefinite 
(GPSBD) matrices introduced by Crouzeix and Komloési [6]. Further, we study some 
properties of GPSBD matrices in the context of LCP. Finally, we show that if a 
matrix A can be written as a sum of a copositive-plus MGPSBD matrix with some 
additional condition and a copositive matrix and if it satisfies the feasibility condition, 
then LCP(q, A) is processable by Lemke’s algorithm. 


Definition 12.2 (Crouzeix and Komlési [6]) A matrix A € R”*” is said to be a 
GPSBD matrix if there exist nonnegative multipliers s;, ¢; with 5; +7, =1, i= 
1,2,...,n such that 


‘ T either —s;z; + tj (ATz), >0 Vi, 
forallze R', 2 Az<0> or —S;)Z; +t; (ATz), <0 Vi. 
Remark 12.1 Itis easy to prove that the matrix A is PSBD if and only if A is GPSBD 
with multipliers 5; = 0 and t; = 1 for alli. 


Neogy and Das [24] have introduced an equivalence definition of GPSBD matri- 
ces. 


Definition 12.3 Let S and T be two nonnegative diagonal matrices with diagonal 
entries s;, t;, where s; +t; = 1 fori =1,...,n. A matrix A € R”™” is said to be 
GPSBD if there exist two nonnegative diagonal matrices S and T with S+T =I] 
such that 


either — Sz+TA’z>0, 


forallz eR", 27Az<0 5 
or —Sz+TA'z<0.. 


Remark 12.2 Note that if § = 0, then GPSBD matrix reduces to PSBD matrix. A is 
called a merely generalized positive subdefinite (MGPSBD) matrix if A is a GPSBD 
matrix but not a PSBD matrix. 


A nontrivial example of a GPSBD matrix is shown below. 
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Example 8 (Neogy and Das [24]) 


Consider a matrix 


02 0 
A=]|-10-1 
O01 0 


It should be noted that v_ (A + A’) = 1. Then for any vector z = [ z1 2 23 ’ aan 
Az =z122 <0 implies that z; and z) have opposite signs. Clearly, A’z = 
[-z2 2z1 +23 ~z]" and for z=[-11 ale , z’ Az < 0 does not indicate A’ z 
is unisigned. Thus, A is not a PSBD matrix. However, by selecting 5; = 0, sz = 1 
and s3 = 0, it is easy to see that A is a GPSBD matrix. 


A careful analysis of the GPSBD matrix definition yields the following result. 


Proposition 12.3. (Neogy and Das [24]) A matrix A € R"*" is a GPSBD matrix if 
any one of the following conditions holds: 


(i) Z, © Zo, where Z, = {z | z7 Az < 0} and Z, = {z| z7 (-S+ TA") z < 0} 
and (—S + AT) is PSBD. 

(ii) t =k, O< k <1, Vi and (—S+ AT) is PSBD. 

(iii) Z={z|z?Az <0} CRY U-R'. 

(iv) aj = OVi and there exists ani = ig such that Aj,. = Oanda;j =O0Vi, j, iF 
lo, j Fi. 

(v) aj = OVi and there exists a j = jo such that A.;, # Oanda;j =OVi, j, iF 
Js i# Jo- 

Neogy and Das [24] studied the following result on GPSBD matrices. 


Theorem 12.11 Let A be a GPSBD matrix. Then for a C {1,...,n}, Aew € 
GPSBD. 


Proof Let xy € R'™! and fora C {1,...,n}, 
Awa Ava 
= Pe pel , 


Let x7 AvaXa < 0. Now define z € R” by taking zy = xy and zg = 0. Then z? Az = 
xt AgaXa. Since A is a GPSBD matrix, z?Az =x AgaXq <0 implies either 
(—S + TA) 2 > 0 (—Sua + Taw Aly) Xe = 0,0r(—S + TAT) z < 0= (—Suat 


Tua Aggy) Xa < 0. Therefore Aggy € GPSBD. 


The above theorem shows that GPSBD is a complete class in the sense of Cottle 
et al. [5]. Crouzeix et al. [7] have shown that if A is a PSBD matrix with rank(A) 
> 2, then A” is also a PSBD matrix. However, the following example shows that in 
general this is not true for GPSBD matrices. 
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Example 9 (Neogy and Das [24]) 


Consider a matrix 


1-10 
A=]|-1 10 
10 O1 


Take z = [ 1 3 -1 ia Clearly, A and A’ are not PSBD matrices. Now by selecting 


Now consider A’. It should be noted that for z = [ 12-1 iid ; 


(1 — 251) z1 — d — 51) Z2 $5, -3 
(-S+TA)z=| —(—s2) 21+  — 252) 22 | = | —42 +3 
100 = 45) 2p 11 =) zs —853 +9 


It is easy to verify that there does not exist any s;, 52, 53 for which A’ satisfies the 
definition of a GPSBD matrix. 


Mohan et al. [14] have proved that a copositive PSBD matrix of rank > 2 is 
sufficient. However, the following example demonstrates that a GPSBD matrix is 
not always sufficient. 


Example 10 


Consider the GPSBD matrix A from Example 9. We note that A is not a PSBD 
matrix. It is easy to verify that A is a row sufficient matrix, though A is not a column 
sufficient matrix. 


We study the following result on row sufficiency, proved by Neogy and Das [24]. 


Theorem 12.12 Let A © MGPSBDN C) with 0 < t; < 1 Vi. Then A is a row suf- 
ficient matrix. 


Proof Assume Z; (A‘z), <0 for alli =1,...,n. Let = {i: z; > O} and h= 
{i : z; < 0}. Now consider three cases. 

Case (i): Ip =@. Then 27 Az = 27 A?z = Do; gj (A%z), <0. Since A € Cp, 
[z; (A%z),] =OVi. 

Case (ii): 1, = 0. Then (—z)" A? (—z) =z? A? z= >; Zi (A72), < 0.Since A € 
Cy, [er{A"s), | =0¥7, 
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Case (iii): Let us assume that there exists a vector z such that z; (A7z), < Ofor all 
i=1,2,...,nand z% (ATz), < 0 for at least one k € {1,2,...,n}. Let 1; 4 @ and 
Ih £9. z7 Az =z" ATz = )°, [z; (A"z),| < 0, which implies —s;,z; + 1; (A7z), = 
OV i or —sjz; + t (ATz), < OV. 


Without loss of generality, assume that —s;z; + t; (A7z), > 0 Vi. Then for all 
i € 1, —s)z? +z; (A7z), = 0, which implies [z; (A7z),] > #2? >O0 Vie N. 


Therefore, Tig [zi (A7z).] > 0. Since z; (A7z), < Oforalli = 1,...,n, itresults 
in a contradiction. Thus, [z; (Az), ] = 0 V i. Hence A is row sufficient. 


The following example demonstrates that the assumption 0 < 4; < | V i in the 
above theorem cannot be relaxed. 


Example 11 (Neogy and Das [24]) 


Let 
100 


A=]| 210 
1000 


It is easy to see that A is copositive but not a row sufficient matrix. For z = 
[-1-11]", z7Az <0. Clearly, A is not PSBD. Now for z =[—1 -11]’, 


(1 — 28,)z; + A — 81) 2z2 + (A — 51) 10z3 7 — 6s, 
(-S+TA’)z= (1 — 259) zo =| 25.-1 
— $323 —S3 


Note that there does not exist 5}, 52, 53, where O < s; < 1 Vi, i.e., there does not 
exist f), t2, f3 where 0 < t; < 1 Vi for which the definition of GPSBD matrix is 
satisfied. However, by selecting s; = 5, 5. = 5s 53 = 0, Aisa MGPSBD matrix. 


Murthy and Parthasarathy [17] have studied the following characterization on 
nonnegative Qo-matrices, which is useful in the subsequent theorem. 


Theorem 12.13 (Murthy and Parthasarathy [17]) Suppose A is nonnegative, where 
n > 2. Then, A € Qo if and only if for everyi € {1,2,...,n}, Ai. FO > aj > 0. 


Theorem 12.14 (Neogy and Das [24]) Suppose A is anonnegative MGPSBD matrix 
with 0 < t; < 1Vi. Then A € C5. 


Proof From Theorem 12.12, A is row sufficient. Therefore, A € Qo Cottle et al. 
[4, p. 239]. Now from Theorem 12.13, for any nonnegative Qo-matrix A;. 4 0 > 
a; > 0. Let a = {i | a;; > 0}. Then x7 Ax = 0 for xy = O. It is easy to verify that 
for (0, xg), Ax > 0= A’x <0. Hence A € CF. 
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From Theorem 12.12, we conclude that LCP(g, A) where A € Co 7 MGPSBD 
matrix with 0 < t; < 1 is processable by Lemke’s algorithm. 


Neogy and Das [24] obtained the following result for solving LCP(q, A) by 
Lemke’s algorithm when A satisfies certain conditions stated in the theorem below. 


Theorem 12.15 (Neogy and Das [24]) Assume A € R"*” can be written as M + N, 
where M € MGPSBDN Cr is nondegenerate with 0 < t; <1V i, and N € Co. 
If the system q+ Mx — Ny >0, y= 0, is feasible, then Lemke’s algorithm for 
LCP(q, A) with covering vector d > 0 terminates with a solution. 


The class MGPSBD NC{ is shown nonempty in the following example. 


Example 12 (Neogy and Das [24]) 


Consider the copositive-plus matrix 


100 
A=|210 
801 


Take z = [-1 -1 ie Note that A is not MPSBD. However, by selecting s; = 
5 Vi, Aisa MGPSBD matrix. 


12.6 The Class of WGPSBD Matrices 


In this section, we study the class of weak generalized positive subdefinite matrices 
(WGPSBD) introduced by Neogy and Das [26]. WGPSBD is a weaker version of 
the class of GPSBD matrices. Row sufficient matrices, proposed by Cottle et al. [4], 
also capture this weaker class of matrices. 


Definition 12.4 A matrix A € R"*” is called a weak generalized positive subdefinite 
(WGPSBD) matrix if there exist two nonnegative diagonal matrices S and T with 
S+T =I such that for all z €¢ R", z’ Az < 0 implies at least (n — 1) coordinates 
of the vector (—Sz + TA" z) are unisigned. 


Note 12.1 WGPSBD contains GPSBD. 


Remark 12.3 A matrix A is called a merely weak generalized positive subdefinite 
(MWGPSBD) matrix if A €e WGPSBD but A ¢ GPSBD. 
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Another version of WGPSBD matrices can be defined as follows. 
Definition 12.5 A matrix A € R”™” is said to be a WGPSBD matrix if there exist 
nonnegative multipliers s;, ¢; with s; +¢, = 1, i= 1,2,...,n such that 


forallze R", z’Az <0 implies 


either {at least (n — 1) coordinates of — 5;z; + t; (Az) >0,iefC {l,2, 


..., nh} or {at least (n — 1) coordinates of — 5;z; + t; (A‘2), <0,ieFC {1,2, 
saagin}}. 

The below example is a nontrivial example of WGPSBD /M Pp. 
Example 13 (Neogy and Das [26]) 


Consider the matrix 


1-12 
A=]|-110 
0 1 


It is easy to verify that A is a copositive matrix. Now we show that A ¢ GPSBD. 


It should be noted that for z = [1,4,-1]", z7Az <0. 


(1 — 2s;) z1 — Ud — 84) Z2 2~ 351 
(-—S+ TA) z= — (1 — 82) z) + A — 252) 22 | = a) 
2 (1 — 53) z1 + CL — 253) 23 1 


It is clear that there are no s;, s2, s3 for which A meets the criteria for a GPSBD 
matrix. However, by definition A is a WGPSBD matrix. 


Neogy and Das [26] studied the following result on WGPSBD matrices. 


Theorem 12.16 Let A be a WGPSBD matrix. Then Ag, € WGPSBD, where a C 
{l,...,n}. 


12.7. The Class of GPSBD Matrices of Level k 


This section introduces the class of generalized positive subdefinite (GPSBD) matri- 
ces of level k, studied by Dubey and Neogy [9]. This class is a subclass of row 
sufficient matrices, which is not a subclass of copositive matrices. Dubey and Neogy 
[9] relaxed the definition of aGPSBD matrix and defined a GPSBD matrix of level 
k to identify subclasses of row sufficient matrices. 
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Definition 12.6 (Dubey and Neogy [9]) A matrix A € R”’*” is called a GPSBD 
matrix of level k (GPSBD;), 1 <k <n, if there exist two nonnegative diagonal 
matrices S and T with S + T = J such that for all z € R", z? Az < Oimplies exactly 
k coordinates of the vector (-S Z+ T A™z) are nonnegative or nonpositive and the 
remaining (n — k) coordinates of the vector (-S z+ T A™z) are negative or positive, 
respectively. 


The class of GPSBD matrices of level k is denoted by GPSBD,. It should be 
noted that for k = n, GPSBD, = GPSBD and WGPSBD = GPSBD,,_; UGPSBD. 
The following example shows that this class is nonempty. 


Example 14 (Dubey and Neogy [9]) 


Let 
1-110 00 0 O 
1 10 00 0 0 
1 01-10 0 0 
A=|]0 01 00 0 0O 
0 00 O1-1-!1 
0 00 01 1 =0 
0 00 01 0 0 
Note that 


(1 — 251) z1 + A — 81) 22 + (I — 81) 23 
— (1 — 82) z) + UA — 25) z2 
10 (1 — 53) z1 + 1 — 253) z3 + (1 — 83) 4 
—Sz+TA'z= —s4z4 — (1 — 84) 23 
(1 — 255) z5 + (1 — 55) 26 + (1 — 85) 27 
— (1 — 86) z5 + (1 — 256) z6 
—$72Z7 — (1 — 87) Zs 


For¢ =[1—1-1-11-1-1]", 27Az <Oand 


—l1 

—2 + 35, 

8 — 783 
—Sz+TA'z= 1 
-1 

—2 + 356 

—1+4+ 257 


It is easy to see that there does not exist any 0 < 5; < 1; therefore, A is not a 
GPSBD matrix. However, for s; = 1, s. = 5, %3=1,5,=0,55= 1, 5 = 5, s7 = 
0, A satisfies the definition of aGPSBDs matrix. As we have 
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—Z1 
—42y 
—Z3 
—Sz+TAlz=| —2z3 
a 
sly, 
—Z5 


and for 2? Az = zi+4+ 4424+ 24 11zz3 <0, z and z3 have to be of oppo- 
site sign. 


We know that the class of GPSBD matrices is complete. However, for a GPSBD; 
matrix, Dubey and Neogy [9] proved a theorem which says that every principal 
submatrix of a GPSBD matrix of level k is also a GPSBD matrix of some level (say 
k, which may or may not be equal to k). This ensures the completeness of the class 

1~1 GPSBDy. 


Theorem 12.17 (Dubey and Neogy [9]) Suppose A € R"*" is a GPSBD, matrix, 
1<k <n. Then Aga € GPSBD,, where a € {1,...,n}and 1 < ky < |a|. 


12.8 The Class of Fully Copositive Matrices 


The aim of this section is to study the class of fully copositive (C 7 ) matrices and their 
properties. Further, we demonstrate that the fully copositive matrices with positive 
diagonal entries are column sufficient and fully copositive matrices which belong to 
Qo are sufficient. 

Positive semidefinite matrices are known to be sufficient. Murthy et al. [20] 
demonstrated that ord M Qo matrices are sufficient. Further, Mohan et al. [15] have 
shown how this result follows from the one that Cottle and Guu [3] proved earlier. 


Theorem 12.18 A matrix A € R"*" is sufficient if and only if every matrix obtained 
from it by means of a PPT operation is sufficient of order 2. 


We need the following result to prove the subsequent theorem. 


Theorem 12.19 (Murthy and Parthasarathy [18]) Jf A € R?*7N c | Qo, then A 
is a PSD matrix. 


As a consequence, Mohan et al. [15] studied the following theorem. 


Theorem 12.20 Let A € cf NM Qo. Then A is sufficient. 
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Proof Since A and all its PPTs are completely Qo-matrices, therefore, all 2 x 2 
submatrices of A or its PPTs are cs M Qo-matrices. From Theorem 12.19, all 2 x 2 
submatrices of A or their PPTs are PSD and so sufficient. Thus, A or every matrix 
obtained by a PPT operation is sufficient of order 2. Hence from Theorem 12.18, A 
is sufficient. 


In view of the above result, we observe that if A € a M Qo, then it is processable 
by Lemke’s algorithm. 


Murthy and Parthasarathy [17] studied the following theorem. 


Theorem 12.21 LetA € R’” beafully copositive matrix. Assume thata;; > OVi € 
{1,2,...,n}. Then A € Pp. 


Contrary to what was stated above, we see that a Ch -matrix is a column sufficient 
matrix under the assumption of positive diagonal entries and that if a matrix A with 
positive diagonal entries and its transpose are in ed , then such a matrix is in Qo and 
hence it is a completely Qo-matrix. 


Theorem 12.22 (Mohan et al. [15]) Suppose A € R"*” be a fully copositive matrix. 
Assume that aj; > OV i € {1,2,...,n}. Then A € Qo. 


(i) A is column sufficient. 
(ii) In addition, if AT € R"™"O oe then A is a completely Qo-matrix. 


The following example demonstrates how it is important to assume that A’ is also 
a fully copositive matrix in the previous theorem in order to get a stronger conclusion 
in (ii). 


Example 15 
Consider the matrix 
1-12 
A=]|-1 10 
0 O01 


It is easy to see that A € or but A ¢ Qo. For example, the vector 
—8 


q=|-5 
2 


is feasible, but LCP(qg, A) has no solution. Also, it is easy to see that At ¢ Ch. 
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12.9 The Class of Singular No-Matrices 


In this section, we study the class of singular No-matrices with Q-property. Pye [29] 
investigated nonsingular No-matrices and posed an obvious query on the connection 
to the class Q. The fact that singular No-matrices of orders 1 and 2 are not Q-matrices 
is easy to see. In his discussion of singular Ny 1 Q-matrices, Pye [29] provides 
examples and observes that if A is a singular No-matrix and all of its principal 
minors are zero, then A ¢ Q. Pye offers two examples of two singular No-matrices 
A, and A> where 


-12 1 -111 
Ai=] 2-1-3], A42.=|] 1 00 
1 0 0 1 00 


Here, A; € Q, but A, ¢ Q. 


The following lemma is useful in the subsequent theorem. 


Lemma 12.6 (Neogy and Das [25], p. 815) Let A € R"*" (n > 3) be an No-matrix. 
If A is nonsingular, then there existsa® Aa C {1,2,...,n}so that A can be written 
in the partitioned form (if necessary, after a principal rearrangement of its rows and 


columns) 
Aga Aga 
= ie | 


for®#aC {1,2,...,n} where Ag, <0, Aza <0, Aaa SO and Aga > 0. 
Neogy and Das [25] studied the following result on singular No-matrices. 


Theorem 12.23 (Neogy and Das [25]) Let A € R”*” be a singular No-matrix (n > 
3) with v(A) > 0. Suppose that A has the partitioned form as in Lemma 12.6 and the 
values of the games, determined by the proper principal submatrices of A of order 
2 or more which contain at least one positive entry, are positive. If A € Ro, then 


AeéeQ. 


Using the above theorem in the following example, we deduce that the given 
matrix A € Q. 


Example 16 (Neogy and Das [25]) 


Consider the matrix 
-42 4 5 


1 0 0 0 
2.1—-1-1-1 
2 -1-1-1 


A= 


Clearly, A is a singular No-matrix with partitioned form as in Lemma 12.6 and v(A) > 
0. It is easy to see that A € Rp. Also, the values of proper principal submatrices of 
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order (> 2) with at least one positive entry are positive. Hence by Theorem 12.23, it 
follows that A € Q. 


Theorem 12.24 (Neogy and Das [25]) Let A € R’*” (n > 3). If A € NoN Q, then 
Ae€E Ro. 


12.10 Almost Type Classes of Matrices 


This section is dedicated to the study of almost type classes of matrices with Q- 
property. Neogy and Das [22] introduce the class of almost N-matrices which is a 
subclass of the almost No-matrices and obtained a sufficient condition to hold Q- 
property. The generalization of almost type class of matrices is called exact order 
matrices. In particular, the matrices of exact order | are called almost type class of 
matrices. 


Definition 12.7 A matrix A €R”*” is called an almost N-matrix if there exists a 
sequence {A} where A® = [a | are almost N-matrices such that a — a;; for 
all i, 7 € (1,2, ...574), 


Example 17 (Neogy and Das [22]) 


Consider the matrix 


It should be noted that A belongs to almost No-matrix. Also, it is easy to verify that 
A belongs to almost NV, since the matrix A can be obtained as a limit point of the 
sequence of almost N-matrices 


af 2-2 
AW =] 1/k-1/k 2 
a 


which converges to A as k > o. 


The following example demonstrates that the matrix class almost N is a proper 
subclass of the matrix class almost No. 
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Example 18 (Neogy and Das [22]) 


Consider the matrix 


Note that A is an almost No-matrix. However, it is easy to see that A is not an almost 
N-matrix because A cannot be obtained as a limit point of a sequence of almost 
N-matrices. 


Now, we take almost No-matrices into consideration and propose the following 
question. Let A € almost No. Then, is it true that 


(i) ACEOSAER? 
(ii) AER) > AE QO? 


We partially resolve the above-mentioned problems in the sequel. The following 
example shows that A € almost NoM Q, but A ¢ Ro. 


Example 19 (Neogy and Das [22]) 
Let 


Since f9,(A), a PPT A € Q. Therefore A € Q. However, (0,1,0,0) solves 
LCP(0, A) implies A ¢ Ro. 


Olech et al. [27, p. 120] demonstrate that an almost No-matrix, even with a positive 
value, not necessarily belongs to the class Q or Ro. 
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Example 20 


Consider the following matrix A with vector q, 


59 99 —1001 
f= 3 _ | —500 
=| 9 378 1% 2) 500 

2 3 30 500 


It is easy to verify that A € almost No, but A ¢ Q even though v(A) > 0. Also, 
A ¢€ Ro. 


_ However, in the following theorem, Neogy and Das [22] proved that if A ¢ almost 
N1 Ro and v(A) > 0, then A € Q. 


Theorem 12.25 Suppose A € almost N OR"*",n > 4with v(A) > 0.ThenA € Q 
ifA € Ro. 


12.10.1_ A Generalization of Almost Type Class of Matrices 


Neogy and Das [22] introduced the N-matrix of exact order 2, which is a general- 
ization of almost N-matrix. This class can be obtained as a limit of a sequence of 
N-matrices of exact order 2. 


Definition 12.8 A matrix A € R”*” is called an N-matrix of exact order 2 if there 
exists a sequence {A} where A® = a are N-matrices of exact order 2 such 


that aj") > aj; for all i, j € {1,2,..., n}. 


Example 21 


Consider the matrix 
0 -—90 —80 —70 0 
90 —2 -—2 2 2 
A= 70 —2 1 3 3 
—50 -—2 -—3 —0.83 
0 2 3 3 0 


Note that A is an No-matrix of exact order 2. Further, since A can be obtained as a 
limit point of the sequence of N-matrices of exact order 2, therefore, A is an N-matrix 
of exact order 2: 
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1/k? —90 —80 —70  1/k 
90 -2 -2 -2 2 
A® = 70 -2 -1 -3 3 
50 -2 -3 -08 3 
Ve 2 3 3 =1P 


which converges to A as k > o. 


The following example demonstrates that an No-matrix of exact order 2 need not 
be an N-matrix of exact order 2. 


Example 22 
Consider the matrix 
00 0 12 
00 -1-12 
A=j]0-10 -11 
1-1-1 00 
21 0 00 


Note that A is an No-matrix of exact order 2. However, since we are unable to obtain 
A as the limit of a sequence of N-matrices of exact order 2, A is not an N-matrix of 
exact order 2. 


__ In the following theorem, Neogy and Das [22] proved a sufficient condition for 
N-matrix of exact order 2 to hold Q-property. 


Theorem 12.26 (Neogy and Das [22]) Let A € NOR™*, n>5 of exact order 2 
with v(A) > 0. Suppose there exists at most one nonpositive principal submatrix of 
order n — | and the values of proper principal submatrices of order (> 2), which 
contains at least one positive entry, are positive. Then A € Q if A € Ro. 


12.11 Principal Pivot Transforms of some Classes of 
Matrices 


The aim of this section is to study and characterize various classes of matrices that 
are defined based on the principal pivot transforms (PPTs). Neogy and Das [23] look 
at a few additional classes that were defined using PPTs. One of these classes has 
the characteristic that all of its PPTs are either Co or almost Co, with at least one 
PPT being almost Co. The other class that is taken into consideration in this study 
has the property that all of its PPTs are either Eo or almost Co, with at least one PPT 
being almost Co. Take note that an almost Co-matrix is not always an Eo-matrix. 
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By demonstrating that this class is in Pp, Neogy and Das [23] proved that if it also 
belongs to Qo, then it is in Ei. 


Definition 12.9 A matrix A € R"*” is called an almost fully copositive (almost 
raed )-matrix if its PPTs are either Cp or almost Co and there exists at least one PPT 
§(A) of A for some a C {1,2,...,} that is almost Co. 


Example 23 (Neogy and Das [23]) 


Consider the matrix 


1 —20 
A=|]0 10 
-101 


It is easy to see that A is an almost fully copositive matrix. 


Theorem 12.27 (Neogy and Das [23]) Let f9,(A) € R’*" bea PPT ofa given matrix 
A € R"*”. Then v(A) > 0 if and only if v(g04(A)) > 0. 


Proof It is sufficient to show that v(A) > 0 > vu(,0,(A)) > 0. Assume v(A) > 0, 
then there exists a z > 0 such that Az > 0. 


Let 
Wa _ Aga Aga La 
wa | | Aaa Aaa | | Za |” 
On premultiplying by 
Sty, © || Sas. oO 
—Aga Tea ~ | —AgadAgy law |’ 


and rewriting we get 
Za | | Aga Aba || We 
wa} LAga Aga} L a J’ 


where Ale = (Agr , Als = Co Aga, Ava = Aga Agi ; Ate = 
Aaa Aga (Ay Aga. 


Since = > Oand | > 0, it follows that v(,(A)) > 0. 


Murthy et al. [19] proved that if A € Q, then v(A) > 0. Since any PPT 9,(A) 
of a Q-matrix is again a Q-matrix. Therefore, for any Q-matrix v((9,(A)) > 0 in 


all its PPTs f9,(A). However, it is simple to demonstrate that for any matrix A with 
v(A) > 0, A is a Q-matrix if and only if A is a Qo-matrix. 
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Theorem 12.28 (Neogy and Das [23]) Let A € R’*" 9 almost en NM Qo (n => 3). 
Then A € Po. 


Now consider the matrix class whose members have PPTs that are either Ep or 
almost Co with at least one PPT that is almost Co. The example below demonstrates 
that this class is nonempty. 


Example 24 (Neogy and Das [23]) 


Let 
1 -10 
A=|-110 
0-21 


It is easy to see that all its PPTs are either Ey or almost Co. Also, A € Qo. 


Theorem 12.29 (Neogy and Das [23]) Let A € R’*" N Qo (n = 3) and the PPTs of 
A be either Eo or almost Co with at least one PPT almost Co. Then A is a Po-matrix. 


12.12 A List of Open Problems 


In this section, we present some open problems. Dubey and Neogy [8] noticed that 
the class of hidden Z-matrices is not complete. However, Pang [28] proved that the 
subclass hidden K of hidden Z is complete. The below example demonstrates that 
the subclass of hidden Z-matrices which is complete properly contains the subclass 
hidden K. 


Example 25 (Dubey and Neogy [8]) 


Consider the matrix 


0 —2 —2 
A=j/-l1 2 1 
-1 0 2 
10 O 2-2 
with X = 01-1 }]andAX =Y=J]|-2 2-1 
-10 1 3 0 2 
3 1 
Taker =|1]|ands=| 1 , then 7X +57Y =[261]. Hence A is a hidden 
2 1 


Z-matrix. It should be noted that A is not a P-matrix. But every principal submatrix 
of A is a hidden Z-matrix. 
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Now, we present the following open problems. 
Problem 12.1 Characterize the class of hidden Z \\ P-matrices, which is complete. 


Problem 12.2 Mohan [13] demonstrated that if A € Z, then LCP(q, A) can be 
solved by using the simplex method to the problem 


Minimize Zo 
subject to w — Az — €nZ%0 = 


w, Z, 70 =O, 


where e, denotes the column vector of all 1’s. Indeed, it is shown that the sim- 
plex algorithm and Lemke’s algorithm provide the same sequence of basic feasible 
solutions. Can we extend this result to the class of hidden Z-matrices? 


Problem 12.3 Characterize the subclass of hidden K -matrices for which the inverse 
is nonnegative. 


Neogy and Das [23] studied and characterized various classes of matrices that are 
defined based on PPTs. Now we have the following open problem. 


Problem 12.4 Give the complete characterization of the class of matrices for which 
at least one PPT is a Z-matrix. 
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Chapter 13 M®) 
On Nearest Matrix with Partially cro 
Specified Eigen-Structure 
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13.1 Introduction 


Let C”*” denote the vector space of all n x n matrices with entries in C. For 
A € C”*", let eig(A) denote the spectrum of A. Let \ € eig(A). We denote the alge- 
braic multiplicity and the geometric multiplicity of X by am(,, A) and gm(), A), 
respectively. If A := {A;,..., Ae} C eig(A) then the total algebraic multiplicity of 
A, denoted by am(A, A), is given by 


am(A, A) := am(\), A) +---+am(,, A). 


We denote the singular values of A by 0;(A) > 02(A) > +--+ > On_1(A) = 0, (A) = 
0 and refer to 0 ;(A) as the j-th singular value of A. 


Eigenvalue Distance Problem: Let A € C”*”. Suppose that A := {A,,..., Ac} C 
C is a given set and m € N is such that € < m <n. Then solve the minimization 


problem 


Sm (A, A) = inf{||X — Allo: X € C”*", A C eig(X) and am(A, X) > m}. 


(13.1) 
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If the infimum is attained at Ao then we have 6,(A, A) = ||A — Aopill2. The 
minimization problem (13.1) is easy to solve when A = {A} and m = 1. Indeed, 
from the SVD of A — XJ it follows that 


510A, A) = Omin(A _ AL) and Aopt =A-— Omin(A = AT)uv™, 


where Omin(A — AJ) is the smallest singular value of A — AJ and, u and v are left 
and right singular vectors of A — AI corresponding to Omin(A — AZ). 
For the special case when A = {A} and m = 2, Malyshev [22] proved that 


= A-Al yl 
02(A, A) = Max G2n-1 (| 0 A~ “) (13.2) 


v= 


and constructed A, from left and right singular vectors. Mengi [23] considered the 
case A = {A} and m <n. Generalizing Malyshev’s result, it is proved in Mengi [23] 
that 


bm(A, A) a SUP{ Omn—m+1 in ® (A ~ Al) ~ N' ® In) : N € UN(m, C)} 


and that the equality holds under two crucial assumptions, where UN(m, C) c C”*™ 

denotes the space of strictly upper triangular (and hence nilpotent) matrices. 
Gracia [10] and Lippert [20] considered the case of two distinct eigenvalues, 

namely, the case when A = {A,, Az} and m = 2. It is proved in Gracia [10] that 


do(A, A) = max o2n—1 (| 0 el 


provided that the maximum is attained at some yo > 0. In such a case, Agp is con- 
structed from the associated left and right singular vectors. 

The general case of ¢ distinct eigenvalues and m > £, namely, the case when 
A= {j,..., Ae} and £ < m <n is considered by Lippert [21]. It is proved that 


bm(A, A) > SUp{omn—m+1Um @ A — B' @ I,) : B € S(m, A)}, 


where S(m, A) consists of m x m matrices B with eig(B) = A such that the Weyr 
characteristic of B satisfies certain constraints; see Lippert [21]. 

Eigenvalue distance problems of the kind (13.1) have been considered for general- 
ized eigenvalue problems in Kressner et al. [18], for polynomial eigenvalue problems 
in Karow and Mengi [14], and for nonlinear eigenvalue problems in Karow et al. [15]. 
See also Papathanasiou and Psarrakos [24], Psarrakos [25], Kokabifar et al. [16, 17] 
and the references therein. The distance problem (13.1) is an offshoot of the well- 
known Wilkinson’s problem, which has been studied extensively in the literature 
(Wilkinson [27-32], Ruhe [26], Demmel [7, 8], Kahan [13], Malyshev [22], Alam 
[1, 2], Alam and Bora [3], Alam et al. [4]). 
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Wilkinson Problem (1965): Suppose that A ¢ C”™*” has n distinct eigenvalues. 
Let w(A) := inf{||X — A|l2 : X is defective}. Determine w(A) and construct Aopt 
such that Ao, is defective and ||A — Aopll2 = w(A). 


It follows from (13.2) that w(A) = inf{d2(A, A) : A € C}. We refer to Alam [1, 
2], Alam and Bora [3], Alam et al. [4] for a comprehensive treatment of Wilkinson’s 
problem based on pseudospectra of matrices. An algorithm that provides a solution 
to Wilkinson’s problem can be found in Alam et al. [4]. 

Now consider a more specialized eigenvalue distance problem. Let A € C”*” and 
B € C”*" be such that eig(B) = A. Then solve the minimization problem 


d(A, B) := int {1X — Alp: XC" xXx Balk (13.3) 


where the blocks denoted by x represent some matrices. Here X ~ Y means X is 
similar to Y. If the infimum is attained at App; then Ao, is similar to a block upper 
triangular dilation of B. Hence the eigen-structure of B is subsumed by that of Ao. 
Thus Ao, has a partially specified eigen-structure given by the eigen-structure of B. 
Obviously, we have 


5m(A, A) = inf{d(A, B): Be C”*”", eig(B) = A}. 


The main purpose of this work is to present a concise framework for analysis of 
the distance problem (13.3) and undertake an in-depth analysis of various issues that 
influence the existence of a solution. We also revisit the eigenvalue distance prob- 
lem (13.1). Our approach is based on extensive use of SVD of the Sylvester opera- 
tor T,,8(X) := AX — XB and subdifferential calculus of singular value functions. 
Analysis of the distance problem (13.3) involves optimization of certain singular 
value functions which are locally Lipschitz continuous and hence are suitable for 
nonsmooth analysis. In fact, the distance problem (13.3) leads to optimization of the 
singular value function given by 


f(N) = om(A, D+ N) 


for all N € UN(m, C), where o,,(A, D+ N) is the m-th smallest singular value 
of the Sylvester operator T4 pin and D € C”*” is a diagonal matrix such that 
eig(D) = A. We show that f (NV) has a global maximum and that either f(V) = 0 
on UN(m, C) or f(N) 4 0 for all nonderogatory matrices N € UN(m, C), the latter 
set being dense in UN(m, C). 

We determine subdifferentials of locally Lipschitz continuous singular value func- 
tions and show that nonsmooth analysis of singular value functions together with 
SVD of Sylvester operators provides a powerful unified framework for analysing the 
eigenvalue distance problem (13.3) irrespective of multiplicity of the singular value 
function f(N) at a critical point. As a byproduct, we derive short and simple proofs 
of the main results. Also, we prove several new results which provide new insights 
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into various conditions that lead to a solution of the distance problem (13.3). For 
instance, if B is nonderogatory with Jordan canonical form J then we show that 


d(A, B) = d(A, J) = max{o,(A, D+ N): N € UN(m, C)} = f (No) 
and the equality holds under the assumption that 
U*U = V*V and rank(V) =m, (13.4) 


where U and V are left and right singular vectors of T,, p+, corresponding to 
Om(A, D+ No) and D := diag(J). We mention that a solution of (13.1) has been 
obtained in the literature under the assumption that the conditions in (13.4) hold. 
However, these crucial conditions have not been analysed in the literature. With a 
view to gaining insight into these conditions, we characterize the conditions in (13.4) 
and show that these conditions are related to centralizer of the m x m matrix D + No. 

The rest of the paper is organized as follows. In Sect. 13.2, we present singular 
value decomposition of a Sylvester operator and discuss basic results which will be 
used in the subsequent development. In Sect. 13.3, we reformulate the eigenvalue 
distance problem and discuss minimal perturbations and their construction from SVD 
of a Sylvester operator. In Sect. 13.4, we consider singular value functions and deter- 
mine their subdifferentials. In Sect. 13.5, we present a singular value characterization 
of the eigenvalue distance problem and provide a solution. Also, for the special case 
when m = 2, we present a short proof of the fact that a solution of the problem (13.1) 
always exists. 


Notation 


Let X € C”*". We denote the null space of X by ker(X) and the column space 
of X by span(X). Two n x n matrices A and B are said to be similar if there is 
an n X n nonsingular matrix S such that S~-'AS = B. We write A ~ B when A 
and B are similar matrices. We denote the conjugate transpose (resp., transpose) 
of a matrix A ¢ C”*" by A* (resp., A'). The Kronecker product of two matrices 
A and B is denoted by A ® B. The n x n identity matrix is denoted by J,,. We 
consider the standard inner product on C”, that is, (x, y) := y*x for x, y € C”, and 
the norm ||x||2 = ./(x, x). We denote the spectral norm of A € C”*" by || A||2 which 
is defined by ||A||2 := max{||Ax|l2 : x € C”, ||x||2 = 1}. We consider the Frobenius 
inner product on C”*” given by (X, Y) := Tr(¥*X) for X, Y ¢ C””*", where Tr(A) 
is the trace of A € C”*”. Then ||X||- := ./(X, X) is the Frobenius norm of X. 
We write an n x n diagonal matrix D with diagonal (scalar) entries \;,..., Ay as 
D = diag(\1, +--+: , An) = diag(\;). Similarly, we write a block diagonal matrix A 
° 7 = A @ Ad. 
On the other hand, if A ¢ C’”*” then diag(A) denotes the diagonal matrix obtained 
from A by setting the off diagonal entries of A to zero. We denote the Moore-Penrose 
pseudo-inverse of A € C”*" by At. Then At € C”*” is the unique matrix such that 
AA* and A* A are orthogonal projections and AAt A = A and A+ AA*t = At. 


with diagonal blocks A; and Az as A = A; © Az, thatis, A = 
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13.2 SVD of Sylvester Operator 


Singular value decomposition (SVD) of a Sylvester operator will play a crucial 
role in analysing the eigenvalue distance problem. We briefly recall singular value 
decomposition of a linear operator acting on a finite dimensional inner product space. 
Let H be a finite dimensional Hilbert space and let dim H = n. Let L(#) denote 
the space of all linear operators on H. 


Theorem 13.1 (SVD, Axler [5]) Let T € L(#). Then there exist unique real num- 
berso\(T) > --- > o,(T) => Oand orthonormal bases {u,, ..., Un} and {v1,..., Vn} 
of H such that 
Tx = o,(T)(x, vi )uy +++ + On(T)(X, Un)Un, x € A. 

Equivalently, Tv; = 0;(T)u; and T*u; = 0;(T)v; for j =1:n. 
The numbers 0) (7), ..., (7) are called singular values of T. The vectors u; and 
v; are called left and right singular vectors, respectively, of T corresponding to the 
singular value 0 ;(T) for j = 1 : n. It follows that 

dim ker(T) > € <=> rank(T) <n-—£ => on-e4i1(T) = 0. (13.5) 
Singular values are well behaved under perturbation in the sense that a small per- 


turbation of the operator results in a small variation in the singular values. More 
precisely, for any AT € L(#), we have the following bounds (Horn and Johnson 


[11]) 
|o,(T + AT) —0j(T)| < ||ATI, j=lin. (13.6) 


Theorem 13.2 (Eckart-Young, Horn and Johnson [11]) Let T € L(A). Consider 
the singular value decomposition of T given by 


Tx =o\(T)(x, v1)Uy +++ + On(T)(X, Un)Un, x € HL 
For £ <n, define the linear operator T; : H —> H by 
Tex = o(T)(x, v1)uy + +++ + oe(T)(x, Un)ue, x © H. 


Then min{||T — S|| : S € LCA) and rank(S) < €} = o¢41(T) = ||T — Tell. 
Thus, if € < rank(T) then 7; is the best rank-¢ approximation of T from L(A). 


Sylvester Operator: Let A € C”*” and B € C”*”. The Sylvester operator 


Tap: Cc" —> C™™” is given by Ty 3(X) := AX — XB. 


250 R. Alam 


It is well known (Horn and Johnson [11]) that ker(T, 2) = {0} <> eig(A)N 
eig(B) = 4. Consequently, ker(T4,2) 4 {0} <> eig(A) Neig(B) AB <=> the 
equation AX — XB = Ohas a nontrivial solution X. The adjoint of Ty g is given by 


T, ,&y= AX — XB* for Kec. 


For X := [x1 oo era e€ C"*”™ , let Vec(X) € C”” denote the vector in C’”"” obtained 
by stacking the columns of X given by 


Xx} 
Vec(X) = Vec([x1 ao tal =) ec™, 


Xm 


Then the map Vec : C”*”" —> CC", X +-> Vec(X), defines a linear isometric iso- 
morphism. Further, for X € C”*”, we have 


Vec(T 4,8 (X)) = Cin @A-— B' ® T,)Vec(X). 


SVD of the Sylvester operator T, , will play an important role in solving the eigen- 
value distance problem. Since Tyg : C’*” —> C”*” is a linear operator on the 
inner product space C”*” equipped with the inner product (X, Y) := Tr(¥*X), by 
Theorem 13.1 we have the following result. 


Theorem 13.3 (SVD of T,4.3) There exist orthonormal bases {U,,..., Umn} and 
{Vi,.--, Vian} of C"™” and unique real numbers o\(Ta,p) > ++- = Omn(Ta,8) = 0 
such that 


Ta,p(X) = o1(Ta p(X, Vi)U, el Omn(Ta,p)(X, Vinn) Umn (13.7) 
forall X € C"*". Equivalently, we have 
AV; = V;B = oj(T4,8)U; and A*U; = U;B* = oj(T,,8)Vj, J =1:mn. 
(13.8) 


We mention that the SVD in (13.7) can also be equivalently written as 
T,,B(Vj) = oj(T,4,8)U; and T, 3(U,) = oj(Ta,8)Vj, J =1:mn. 


The matrices U; and V; are called left and right singular vectors of T 4,3 correspond- 
ing the singular value 7 ;(T4,g) for j = 1 : mn. The rank of T,_g is the number of 
nonzero singular values of T, 3. 


Corollary 13.1 Consider the SVD of T,4,8 as given in Theorem 13.3. Set uj = 
Vec(U;) and v; := Vec(V;) for j = 1: mn. Then the SVD of (In ® A — B' @1,) 
is given by 
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(Im ® A— B' @1,)vj = 0j;(Ta,p)uj and In @ A— B! @ I,)*uj = 0;(Ta,p)v; 


for j =1:mn. Further, 0;(T,4,3) = 0;Um ® A- B'® [,) for j = 1: mn. Fur- 
thermore, we have 


dim ker(T4,.g) >m <> rank(T,4.g) <mn—m > oOmn—m4i(T4,p) = 0. 
Proof Since Vec(T 4,3(X)) = (Im ® A — B' ® I,)Vec(X) for all X ¢ C"*” and 


Vec : C”*” —-» C’" is an isometric isomorphism, the SVD of (J, @ A — B' @ I,) 
follows from Theorem 13.3. The assertion on rank of T,4 3 follows from (13.5). 


We consider the induced operator norm of T,,g given by 

[Taal] = sup{|Ta.e(X)llp : X eC" and ||X||- = 1}. 
Then it follows that ||T4,2|| = o1(T4,8) = ||Un ® A — B! @1,)||2. For the rest 
of the article, we set 0;(A, B) := Omn—j4i(T4,g) for j = 1: mn. This counts the 


singular values of T,4 g in ascending order of magnitude and we have 


o\(A, B) Ses Omn(A, B). 


Proposition 13.1 Let AA € C’*" and AB € C”*". For j = 1: mn, we have 


loj(A + AA, B) — 0 (A, B)| < |AAll2, 
|o;(A, B+ AB) —0(A, B)| < ||ABll2, 
loj)(A + AA, B+ AB) —0;(A, B)| < |Im @ AA — (AB)! @ Iyllo. 


Further, for any unitary matrices U € C"*" and V € C”*", we have 
oj(U*AU, V*BV) =0;(A, B), j=1:mn. 
Proof We have Ta+aa.p+asp = Tap + Taa.apz.- Hence by (13.6), we have 


laj(A + AA, B+ AB) —0j(A, B)| = lomn—jai(Tataa.p+aB) — Omn—j+i(Ta,B)| 
< Tas.azll = [Im @ AA — (AB)! ® Info. 


Setting AB = 0, wehavea;(A + AA, B) —a;(A, B)| < || AAll2. Similarly, setting 
AA = 0, we have |o;(A, B + AB) — 0; (A, B)| < || ABll2. 
Finally, note that V' @ U* and (V' @ U*)* = (V')* @ U are unitary and 


(V' @U*)UIm @ A—B' @h)(V' @U*)* = In @U*AU — V'B(V')* OI, 
= In ® U*AU — (V*BV)! @I,. 
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Hence for j = 1 : mn, by Corollary 13.1, we have 


oj(U*AU, V*BV) = Omn—j4iUn ® U*AU — (V*BV)' @ In) 
= Omn—j+1(V! ® U*)Un ® A- Bt ® I,)(v" ® U*)*) 
Oma—j+1Um ® A— B' @ I,) = 0;(A, B). 


This completes the proof. 


Observe that ||J, @ AA — (AB)! @Iy||2 < ||AAll2 + || ABll2. Hence we have 
|aj(A + AA, B+ AB) —0j(A, B)| < ||AAll2 + || AB|l2. Next, by Theorem 13.2, 


min{||Ta.s — S|| :S ¢ L(C"*"), rank(S) < mn — m} = 0,,(A, B) (13.9) 
and the minimum is attained at Sop given by 
Sop (X) = (Ta p(X, Vi)U, op fase Omn—m(T4,B)(X, Vinn—m) Omn—m 


for all X <¢ C’*”. Hence Ta.B = Soptl = Omn—m+1 (Ta,B)) = Om(A, B). 

We mention that Sop, — T4,3 may not be of the form T, 4 0, that is, there may not 
exist AA such that Sopp = Ta+aa,p. Next, we consider conditions under which we 
have Sop = Ta+aa,8- This will play a crucial role in solving the eigenvalue distance 
problem. 


13.3. Minimal Perturbation 


Let A € C”*" and X € C”*” be such that rank(X) = m. Then span(X) is said to be 
an invariant subspace of A if A span(X) C span(X). Clearly, span(X) is an invari- 
ant subspace of A <=> AX = XB for some B € C”””. In such a case, we have 
eig(B) C eig(A). 

Definition 13.1 Let A ¢ C”*” and B € C”*”. We write B C Aifm <n and there 
exists X € C”*” such that rank(X) = m and AX = XB. 


Obviously B C A => eig(B) C eig(A). Moreover, the spectral structure of B is 
subsumed by that of A. This is evident from the fact that 


B|x 
BCA Ss Az a: (13.10) 


where the blocks denoted by x represent some matrices. Thus A is similar to a block 
upper triangular dilation of B. 

Let C(B) := {X € C”*”" : BX = XB} be the centralizer of B. Then C(B) is a 
subspace of C””*” and by Gantmacher [9, Theorem 1, p.219] dimC(B)) > m. 
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Theorem 13.4 Suppose that A € C"*" and B € C”*". Then the following results 
hold. 


(a) BC A= > S"'BS C A for any m x m nonsingular matrix S. 
(bt) BCA <> ker(T,,g) contains a full column rank matrix. 

(c) BC A=> dimker(T,. 3) = dimker(J,, ®@ A— B' @1,) >m. 
(d) BC A= > on(A, B) = Omn—m+1Um ® A— B' @ I,) = 0. 


Proof The proofs of (a) and (b) are immediate. Suppose that B C A. Then there 
exists X € ker(T,4,g) such that rank(X) = m. Now for any D € C”*", it is easily 
seen that 


XD €ker(T, 3) —> BD=DB —> DEC(B). 


By Gantmacher [9, Theorem 1, p.219] the dimension of C(B) is at least m. Since 
X has full column rank and XD € ker(T,4.g) for all D € C(B), it follows that 
dim ker(T 4,3) > m. Since 


X €ker(T, 3) <=> Vec(X) € ker(I, @ A— B' @1,), 
the proof of (c) follows. 


Finally, since Om(A, B) = Omn—m+i(Ta,B) = Omn—m+1Uin @A-— Br ® In), the 
proof of (d) follows from Corollary 13.1. 


Let A € C”*" and B € C”*”., Then, in view of (13.3) and (13.10), we have 
d(A, B) = inf{||AAl]2: AA € C”*” and B C A+ AA}. (13.11) 


By Theorem 13.4(a), B C A+ AA => S-!BS C A+ AA for any m x m nonsin- 
gular matrix S. Hence it follows that 


d(A, B) = d(A, S~'BS) 
for any m xX m nonsingular matrix S. For the rest of the paper, consider the set 
A := {A1,..., Ae} C C consisting of € distinct real or complex numbers. For any 
natural number m such that £ < m <n, consider the set 
M,, (A) := {X € C"*": A C eig(X), am(A, X) > m}. 

Then we have 6,,(A, A) = inf{||AAllz2: A+ AA € M,,(A)} =: dist(A, M,,(A)). 
Our aim is to determine 4,,(A, A) and a matrix App such that || A — Aopt|l2 = 
bm(A, A). It is evident that 

Om(A, A) = inf{d(A, B): Be C”*”, eig(B) = A}. (13.12) 


Thus, in order to determine 6,,(A, A), we need to determine d(A, B). For this pur- 
pose, we need the following result. 
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Theorem 13.5 Let A € C’*" and B € C”*". Then the following results hold. 


(a) BCA+AA>=> 0,(A, B) = Omn—m+1 in @A- BY @ I,) < ||AAlle. 

(b) BC A+AA = 0,,(A, S~'BS) < || AA ||2 for any nonsingular matrix S. 

(c) A+AAEM,,(A) <> there exists BE C"*” such that eig(B) = A and 
BCA+AA. 

(d) A+AA EM, (A) => there exists BE C"*” such that eig(B) = A and 
Om(A, B) = Omn—m+1 in @A- Bt ® Tn) < | AAllo. 


Proof By Theorem 13.4(d), BC A+ AA = > 0,,(A+ AA, B) = 0. Hence by 
Proposition 13.1, we have o,,(A, B) = |o,(A + AA, B)—o(A, B)| < ||AAle. 
Consequently, we have o,,(A, B) = Omn—m41Um ® A —- B' @1,) < |\|AAll2. This 
proves (a). Next, by Theorem 13.4(a), B C A+ AA => S~'BS C A+ AA forany 
nonsingular S. Hence (b) follows from (a). 

Now, if A+ AA €M,,(A) then am(A, A + AA) > m => there exist X € 
Cc"*” with rank(X) = mand B € C”*” such that (A + AA)X = XB andeig(B) = 
A. In other words, B C A+ AA and eig(B) = A. Conversely, if there exists B € 
Cc”*” such that B C A+ AA and eig(B) = A then obviously A + AA € M,,(A). 
This proves (c). Finally, (d) follows from (a) and (c). 


The next result shows that d(A, B) = o,,(A, B) provided that appropriate conditions 
are satisfied. 


Theorem 13.6 (Minimal Perturbation) Let A € C”’*” and Be C"*”. Let UE 
Cc" and V € C"*" be left and right singular vectors of T 4,3 corresponding to the 
singular value 0,,(A, B), respectively. Suppose thatrank(V) = mandU*U = V*V. 
Define 

AA := —0m(A, B)UV™, 
where V* is the Moore-Penrose pseudo-inverse of V. Then B C A+ AA. Moreover, 

(A + AA)V = VB and ||AA|lz = on (A, B) = d(A, B). 
Define Sop := Ta+aa,p- Then dim ker (Sopt) >m and 
|Sopt — Ta, Bl] = min{|S — Tg gl] :S ¢ L(C"*"), rank(S) < mn — m} = om(A, B). 

Proof Note that V'V = I,,. Hence the SVD of T,.z gives 


AV — VB =on(A, BJU = (A—om(A, B)UVT)V = VB = (A+ AA)V = VB. 


This shows that B C A+ AA. Next, since U*U = V*V and VV* is an orthogonal 
projection, we have 


UV" I = Amax((U V UV ) = Amax ((V*)*U*UV*) = Ama (Y VV Ve) 
= Amax((V V VV )= Anan V VY = 1, 
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where Amax(Y) denotes the largest eigenvalue of a positive semidefinite matrix Y. 
This proves that || AA|l2 = om(A, B). 

Since B C A+ AA, by Theorem 13.4(c), dim ker(Sop,) = dim ker(T4+a4,8) = 
m. Thus rank(Sop.) <mn—m and ||Sop — Taal] = |'Taaoll = ||AAllo = 
on(A, B). Hence by (13.9), it follows that Sop is the best approximation of T, z 
such that dim ker(Sop) = m. 

Finally, since BC A+ AA, by (13.11), d(A, B) < ||AA]l2 =o, (A, B). On 
the other hand, by Theorem 13.5(a), for any AZ such that BC A+ AZ => 
Om(A, B) < ||AZ||2. Now taking infimum over all such AZ, we have o,,(A, B) < 
d(A, B). Hence we have d(A, B) = o,,(A, B). This completes the proof. 


Remark 13.1 The proof of Theorem 13.6 shows that 
U*U = V*V => |UVt |. =1 


irrespective of the rank of V. When V is rank deficient, assume that the condi- 
tion UVtV = U is satisfied. Then we have (A —o,,(A, B)UVt)V = VB and 
||UV* ||, = 1. This shows that for AA := —o,(A, B)UV™, we have || AA||2 = 
Om(A, B) and that A+ AA and B have common eigenvalues and span(V) is an 
invariant subspace of A + AA corresponding to the common eigenvalues. Note that 
the full column rank of V is crucial for ensuring that B C A+ AA. 


In view of Remark 13.1, we have the following result. 


Corollary 13.2 (Rank Deficient) Let A € C”*" and B € C"*”. LetU € C"*” and 
V eC” be left and right singular vectors of T4,.3 corresponding to the singular 
value 0m(A, B), respectively. Suppose that UV*V = U and U*U = V*V, where 
V+ is the Moore-Penrose pseudo-inverse of V. Define 


AA := —om(A, B)UV™. 
Then (A + AA)V = VB and ||AA|l2 = Oo (A, B). Further, span(V) is an invariant 
subspace of A + AA corresponding to some eigenvalues in eig(A + AA) NM eig(B). 


Next, we characterize the condition UV+t V = U when U and V are left and right 
singular vectors of T4 , corresponding to 0,,(A, B). We have the following result. 


Proposition 13.2. Let U and V be left and right singular vectors of T 4,8 correspond- 
ing to O,(A, B) 4 0, respectively. Then the following conditions are equivalent. 


(a) Bker(V) Cc ker(V). 
(b) ker(V) Cc ker(U). 
(c) UVtV=U. 


In particular, if U*U = V*V then ker(V) = ker(U) and Bker(V) C ker(V). 


Proof First, recall that T, 3(V) = o,(A, B)U => AV — VB =0,(A, B)U. 
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Now, assume that (a) holds. Let x € ker(V). Then Bx € ker(V). Consequently, 
we have o,(A, B)Ux = AVx — VBx = 0 which shows that Ux = 0. Hence x € 
ker(U). Since x is arbitrary, we have ker(V) C ker(U). This proves (a)=>(b). 

Next, suppose that (b) holds. Observe that V+ V is an orthogonal projection on 
span(V*) and VUImn — V*V) =V—-VV*V =V—V =O. Hence forx € C”, we 
have 


x=VtVx+ Un, —VtV)x and (I, —VtV)x €ker(V) C ker(U). 


Consequently, we have UVtVx = U(VtVx + Um — V*V)x) = Ux for all x € 
C”. Hence UV*V = U. This proves (b)>(c). 

Finally, assume that (c) holds. Let x € ker(V). Then U = UV*V => Ux = 
UVtVx =0. Hence (AV—VB)x =0o,(A, B)Ux = VBx =0=> Bxe 
ker(V). Since x € ker(V) is arbitrary, we have Bker(V) C ker(V). This proves 
(c)=>(a). 

Note that U*U = V*V ==> ker(V) = ker(U). Next, AV — VB =o0,,(A, B)U 
shows that if x € ker(V) then VBx = 0 => Bx € ker(V). Hence Bker(V) C 
ker(V). This completes the proof. 


Now, we analyze the conditions U*U = V*V and rank(V) = m and determine suf- 
ficient conditions under which these conditions are satisfied. 


Theorem 13.7 Let U and V be left and right singular vectors of T 4_p correspond- 
ing tO Om (A, B). Suppose that o(A, B) #0 and that C := BU*V — U*VB is 
nilpotent. Then U*U = V*V. If B is nonderogatory then there exists a polyno- 
mial p of degree < m — 1 such that U*V = p(B). If p(A) 4 O for \ € eig(B) then 
rank(V) = m. 


Proof Since U and V are left and right singular vectors of T4 2, we have 
AV —VB =o, (A, B)U and A*U —UB* =0,,(A, B)V. 


This gives 0,,(A, B)(U*U — V*V) = BU*V — U*VB = C which shows that C is 
Hermitian. Since C is nilpotent, we have C = 0. Now C = 0 = U*U = V*V as 
Om (A, B) is nonzero. 

Finally, since U* V commutes with B and B is nonderogatory, by Horn and John- 
son [11, Theorem 3.2.4.2, p.178], we have U*V = p(B) for some polynomial p of 
degree at most m — 1. Since p(B) is nonsingular as p(\) 4 0 for \ € eig(B), we 
have U*V = p(B) is nonsingular. This shows that rank(V) = m. 


Next, we characterize the condition U*U = V*V. These characterizations provide 
insight into the relation between U, V and B. Recall that 


C(B) := {X €C"™*" : BX = XB} 


is the centralizer of B. We have seen that dim C(B)) > m. 
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Theorem 13.8 Let U and V be left and right singular vectors of T4,p correspond- 
ing tO Om(A, B). Suppose that o,(A, B) 40. Then the following conditions are 
equivalent. 


(a) C:= BU*V — U*VB is nilpotent. 

(b) U*V ECB). 

(c) UU =V"*V. 

(d) U = WV for some unitary matrix W € C"*". 
(e) Either U*V or B commute with C. 


In particular, if both B and U*V are upper triangular then U*U = V*V. 


Proof The equivalence of (a), (b) and (c) follow from the proof of Theorem 13.7. 
By Horn and Johnson [11, Theorem 7.3.11, p.452], U*U = V*V <=> there is a 
unitary matrix W € C”*” such that U = WV. This proves the equivalence of (c) and 
(d). 

Now, if (e) holds then by Jacobson’s Lemma (Horn and Johnson [11, Exer- 
cise 2.4.P12, p.126]), we have C” = 0. Hence C is nilpotent. On the other hand, 
if C is nilpotent then by Theorem 13.7, we have C = 0. Consequently, C commutes 
with both U*V and B. This proves equivalence of (a) and (e). 

Finally, if U*V and B are upper triangular then C := BU*V — U*VB is strictly 
upper triangular and hence nilpotent. Consequently, C = 0 and hence U*V € C(B) 
which in turn implies that U*U = V*V. 


Remark 13.2 Without loss of generality we can assume that B is upper triangular. 
In such a case, if U*V is upper triangular then C := BU*V — U*VB is nilpotent. 
In fact, U*V € C(B) when both B and U*V are upper triangular. However, it is not 
necessary that U*V be upper triangular in order for it to be in C(B). 


To summarize: We have seen that BC A+ AA => S~!BS C A+ AA and that 
d(A, B) = d(A, S~'BS) for any m x m nonsingular matrix S. We have also 
seen thatB C A+ AA => o,,(A, B) < ||AA|l2 = > on(A, B) < d(A, B) and the 
equality holds under the assumptions as given in Theorem 13.6. By Proposition 13.1, 
we have o,,(A, W*BW) = 0,,(A, B) for any m x m unitary matrix W. Hence by 
Theorem 13.6, we have 


d(A, B) = om(A, B) = on (A, W*BW) =d(A, W*BW) 


for any unitary matrix W. Thus, without loss of generality, we can assume that B 
is an upper triangular matrix. Indeed, by Schur theorem there is a unitary matrix W 
such that W* BW is upper triangular. 

Our next aim is to derive a singular value characterization of d(A, B) assuming that 
B is upper triangular and relate the conditions in Theorem 13.6 with optimization of 
the singular value function X +> o,,(A, X). By Proposition 13.1, the singular value 
function X +—> o,,(A, X) is Lipschitz continuous. Hence Clarke subdifferential of 
Om(A, X) canbe gainfully utilized for optimization of the function X +— o,,(A, X). 
In the next section, we determine Clarke subdifferential of locally Lipschitz singular 
value functions. 
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13.4 Clarke Subdifferential of Singular Value Functions 


Let H bea finite dimensional Hilbert space over the field C. A function f : H —> R 
is said to be locally Lipschitz continuous at x9 € H if there is a neighbourhood 
Nbd (xo) of x9 and a constant a > 0 such that 


x, y € Nbd(xo) ==> |f() — f)| < allx — yl. 
We consider Clarke subdifferential of a locally Lipschitz function and refer to it as 


subdifferential. We denote the real part of a complex number z by re(z). 


Definition 13.2 (Clarke [6]) Let f : H — Rbea locally Lipschitz continuous func- 
tion at x) € H. The generalized directional derivative of f at xo in the direction of 
u € H is denoted by f°(xo; u) and is defined by 


f° (xo; uw) := lim sup F@ + tu) — F(x) 


X>xg t>0+ t 


Definition 13.3 (Clarke [6]) Let f : H — Rbea locally Lipschitz continuous func- 
tion atx) € H. Then the subdifferential of f at x9 is denoted by O f (xq) and is defined 
by 

Of (xo) := {x € H: f° (xo; u) = re(u, x) for allu € H}. 


Each v € Of (xo) is called a subgradient of f at x9. 


Definition 13.4 (Clarke Stationary Point) Let f : H — R be locally Lipschitz and 
xo € H. Then xp is said to be a Clarke stationary point of f if 0 € Of (xo). 


We now prove the chain rule for subdifferential of g o f when g is Lipschitz and f 
is locally Lipschitz. This will help us to determine subdifferentials of local Lipschitz 
singular value functions. 


Theorem 13.9 Let U and V be finite dimensional Hilbert spaces over C. Let f : 
U — V be continuously differentiable in aneighbourhood of xy € U andg: V > R 
be Lipschitz continuous. Then the subdifferential of g o f at xo is given by 


A(g o f)(%0) = (Df 0)" Og(f Ao), 


where Df (xo) is the (Frechet) derivative of f at xo, (Df (xo))* is the adjoint operator 
of the linear map Df (xo): U > V,ht> Df (xo)h, and Og(f (x)) is the subdiffer- 
ential of g at f (xo). 


Proof Since f is continuously differentiable in a neighbourhood of xo and g is 
Lipschitz, it follows that g o f is locally Lipschitz at x9. Hence O(g o f)(xo) is well 
defined. By the Chain rule (Clarke [6]), we have 


A(g o f)%0) C (DF 0)" O8(F Go): 


13. On Nearest Matrix with Partially Specified Eigen-Structure 259 


We now show that the equality holds. Let y € (Df (xo))*0g(f (xo)). Then we have 
y = (Df (xo))*v for some v € Og(f (xo)). Consequently, for x € U, we have 


(x, y) = (x, (Df (0))*v) = (Df (xo) x, v). (13.13) 


Set h := go f. We calculate the generalized directional derivative h°(xo; u) for 
uéU. Note that f(x +tu) = f(x) +tDf(x)u+e(t) for some e(t) such that 
e(t)/t —> Oast > 0. Hence h(x + tu) = g(f(x + tu)) = g( f(x) +t Df()ut 
e(t)). Since g is Lipschitz, we have 


e( f(x) +tDf (x)u) — €(t) S$ g(f (x) + tDf (x)u + €(t)) < gC f(x) +t Df (x)u) + €(t). 


Consequently, we have (h(x + tu) — h(x))/t < (g(f(x) + tDf (x)u) — g(f(x)))/ 
t+e(t)/t and (h(x + tu) —A(x))/t > (g(f) + tDf Ow) — g(f0O))/ 
t — e(t)/t fort > 0. Hence taking limsup as x — xo andt > 0, wehaveh°(xo, u) = 
g°(f (xo); Df (xo) u) for all u ¢ U. Now by (13.13) and the definition of subdiffer- 
ential, we have 


re(x, y) = re(Df (xo)x, v) < 8°(f 0); Df o)x) = h° (x0; x) 


for all x € U, which shows that y € Oh(xo) = O(g o f)(xo). This completes the 
proof. 


Let A ¢ C”*”. We denote the singular values of A by o(A) > --- > 0,(A) = 0, 
where p = min(m,n). Consider the m x n diagonal matrix & := diag(a;(A),..., 
o(A)). Then there exist unitary matrices U € C”*” and V € C’*” such that A = 
UXV* is a singular value decomposition (SVD) of A. The columns of U are left 
singular vectors and columns of V are right singular vectors of A. We write a(A) 
to denote a singular value of A. If u and v are left and right singular vectors of A 
corresponding to 0(A) then Av = o(A)u and A*u = o(A)v. If @ is the multiplicity 
of a(A) then there exist isometries U € C”*¢ and V € C”** such that AV = o(A)U 
and A*U = o(A)V. 


Definition 13.5 Let U C C?*? be an open set. A function f : U —> R is said to 
be a singular value function if f is of the form 


F(X) =o(F(X)), xX EU, 
for some function F : U —> C”*", where o(F (X)) is a singular value of the matrix 
F(X). 


If F(X) is continuously differentiable on U then it follows that the singular value 
function f (X) := a(F(X)) is locally Lipschitz continuous on U. We now determine 
subdifferential of a locally Lipschitz singular value function. 


Theorem 13.10 Let A: C — C”*" be continuously differentiable in a neighbour- 
hood of X € C. Let g: C > R be given by g(z) := a(A(z)), where a is a singular 
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value of A(z). Suppose that g(X) 4 0 and that the multiplicity of o(A(A)) is p. Then 
the subdifferential of g at X is given by 


Og(A) = F(V*(A"(A))*U), 


where U € C™*? and V € C”™? have orthonormal columns such that A(A)V = 
g(A)U and (A(A))*U = g(\)V. Here F (X) denotes the Field of values of a square 
matrix X and A'(A) denotes the derivative of A(z) at X. 


Proof First note that g(z) is locally Lipschitz at . Indeed, since A(z) is continuously 
differentiable in a neighbourhood of X and o : C”*” — R is Lipschitz, it follows 
that the function g = 0 o Ais locally Lipschitz at A. Consequently, by Theorem 13.9, 
we have 


dg(A) = (DA())*da(A(A)), 


where Oa(A(A)) is the subdifferential of the map X +> o(X) evaluated at A()). 

Note that the derivative DA(\) :C —> C"*",zH > A’(A)z, is a linear map 
and its adjoint (DA(A))* : C”*” —> C is given by (DA(A))*X = Tr((A’(A))*X). 
Indeed, we have 


(z, (DA(A))*X) = (DA(A)z, X) = (A’(A)z, X) = Tr(X*A'(A)z) 
= ¢Tr(X*A'(A)) = (z, Tr(X*A’(A))) 


which shows that (DA(A))*X = Tr(X*A’(\)) = Tr((A’(A))*X). 
Next, by Lewis and Sendov [19, Corollary 6.4] the subdifferential of X t a(X) 
at A(A) is given by 
da (A(A)) = Convex hull {Uxx*V* : x € C? and ||x||2 = 1}. 


Consequently, we have 


Og(A) = Convex hull {Tr((A’(A))*Uxx* V*) :x € C? and ||x|l2 = 1} 
= Convex hull {x*(V*(A’(\))*U)x : x € C? and |x|]. = 1} 
= F(V*(A'(A))*U). 


This completes the proof. 


Next, we consider partial and directional subdifferentials of singular values. Let 
a:C"™*" + R, Xb a(X), be a singular value map, that is, o(X) is a singular 
value of X. Suppose that o(X) is nonzero. We define the directional subdifferential 
of o at X in the direction of E € C”*” by 


Oga(X) := Og(0), (13.14) 
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where g:R— R is given by g(t) :=o(X +tE). For the special case when 
E= ee; the canonical basis element, we denote the directional subdifferential 
by 0;;0(X) and refer to it as the (i, j)-th partial subdifferential of o at X. By Theo- 
rem 13.10 we have 


Ogo(X) = F(V*E*U) and 0,;0(X) = F (V*eje7U), (13.15) 


where U and V are isometry such that XV = o(X)U and X*U = oa(X)V. 


Theorem 13.11 Let F : C?*4 > C”*" be continuously differentiable in a neigh- 
bourhood of A € C?*4. Define g: C?*4 > R by g(X) :=o0(F(X)), where o: 
c”*" > R, X & o(X), is a singular value map. Suppose that g(A) 4 0 and that 
the multiplicity of g(A) is r. Let U € C”*" and V € C"*" be isometry such that 
F(A)V = g(A)U and (F(A))*U = g(A)V. Then for E € C?*4 we have 


Ong(A) =F (V*(DF(A)E)*U) and ayaa) =F (V ( Ae ) u) 


i 


OF(A 
where DF (A) is the derivative of F at A and ) is the partial derivative of 


i 
F(X) with respect to the (i, j)-th entry of X evaluated at A. 


Proof For E € C?*4, define gz : R > R by gg (t) := g(A+ tL). Then dgg(A) = 
Ogr(0). Hence by Theorem 13.10, we have Og g(A) = F (V*(DF(A)E)*U). Now 
by considering E := E;,;, the matrix with (i, j)-th entry 1 and zero everywhere else, 
we obtain the desired result for the partial subdifferential. 


For the special case when r = | (that is, when g(A) is a simple singular value) and 
q = 1, Theorem 13.11 gives the result obtained by Ji-Guang [12]. 

A special case of Theorem 13.11 will be useful in the subsequent development. 
Suppose that F : C?*4 — Ck!" is given by 


F(X) := 9(X) @ B, 


where B € C”*" and # : C?*4 — C*! is continuously differentiable in a neigh- 
bourhood of A. Then by Theorem 13.11, we have 


00(A) 


deg(A) =F (V* (DA(A)E ® B)*U), Oijg(A) =F (v (Fe 
ij 


eB) uv 


Further, a partial subgradient w € 0); g(A) is given by 


w= Tr (seu, (ee) . 
OX ij 
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where V,, € C”*¢ and U,, € C”** are such that Vec(V,,) = Vx and Vec(U,,) = Ux 
a4 ok 

for some x € C’. Indeed, we have w = x*V* (248 @ B) Ux for some x € C’. 

Then with U,, and V,, as defined above, we have 


T\* <a 
w= vettyVee( a, (se) = (sev, (S2-) 
OX ij OX; 


as desired. 


13.5 Singular Value Characterization of Eigenvalue 
Distance Problem 


We have seen that d(A, B) = d(A, S~'BS) for any m x m nonsingular matrix S. 
Thus, if J := S~!BS is the Jordan canonical form of B then d(A, B) = d(A, J). In 
particular, for any unitary matrix W, we have 


d(A, B) = d(A, W*BW) = om(A, W* BW) = on (A, B). (13.17) 


Hence, without loss of generality, we can assume that B is upper triangular. 
Recall that UN(m, C) C C”*” is the space of m x m upper triangular nilpotent 
matrices. By Schur theorem, there is a unitary matrix W € C”*” such that 


W*BW=D+WN and eig(D) = eig(B) = A, (13.18) 


where D is a diagonal matrix and N ¢ UN(m, C). The eigenvalues of B can be 
made to appear in any order in D by choosing an appropriate W. This means that 
On(A, D+ N) =0,,(A, B) is independent of the order of the eigenvalues that 
appear in D. Henceforth, we assume that B is upper triangular and is given by 
B= D+N with D := diag(B) and N € UN(m, C). 

Itis proved in Gracia [10] that o,,(A, D + N) —> Oas ||N|l2 —~ co whenm = 2. 
We now show that o,,(A, D+ N) —> 0as ||N||2 — oo for any m <n. 


Lemma 13.1 Define f : UN(@m, C) — R by f(N) := on(A, D+ N). Then we 
have f(N) —> 0as ||N||2 — o©. Further, there exists Non € UN(m, C) such that 


sup{ f(N) : N ¢ UN(m, C)} = max{ f(N) : N ¢ UN(m, C)} = f (Nopt)- 
Proof We have 


f(N) = Omn—m+1 in @A— (D — N)! ® I,) = Omn—m+1(A =; N), 
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where A := I, @ A — D ® I, is block diagonal and N := N T @ I, is strictly block 
lower triangular and hence nilpotent. 

If D = diag(}11,--- , 4m) then A = (A — Wy 1,) ®--- @ (A — My I,). Thus A — 
N is block lower triangular. Note that A — N is invertible <= A is invertible. For 
i=1:m—1and j =2:™m, let r;; denote the (i, j)-th entry of the strictly upper 
triangular matrix N. Then ||N||2 > co <=> |r;;| > 00 for some (i, j). 

First, suppose that eig(D) ¢ eig(A). Then A is invertible and NA‘! is strictly 
block lower triangular and hence nilpotent and is given by 


0 0 0 sae 0 
ry2(A = itn)! 0 0 vee 0 
0 0 


r13(A = win)! 123(A = paly)! 


rim(A —_ play Tam(A —_ Holy) ei! Tm—1,m(A —_ fied) 0 


Note that r12(A — py In)7!, 723(A — fan)! ©. - 5 Pm—tam(A — bm—1In)7! are the 
first sub-diagonal blocks of NA7!. Similarly, (WA7')? is block lower triangular 
whose first sub-diagonals blocks consist of zero blocks and the second sub-diagonal 
blocks are given by 


ri2r3(A — pidn) (A — patn) |, 123r34(A — pan) (A — pln) |. 
‘m—2,m—1"m—1,m (A —— jipeat,) A _ Psily) 


More generally, (WA~!)/ is block lower triangular and the first 7 — 1 sub-diagonal 
blocks consist of zero blocks. Hence (VA~')”~! has only one sub-diagonal block 
(the m-th sub-diagonal block) given by mea rjj4(A — pj I,)~! and (NAT!) = 
0. Consequently, we have 


m—1 
(A=NY =A + AWAY Scr + FW). 


j=l 


Note that rank ((NAT!)/) >m if (NA7!)/ #0. Hence it follows that F(N) is 
strictly block lower triangular and hence nilpotent and rank(F'(N)) > m. 
If X;; denotes the (i, j-th block of F(N) then it follows that 


Xij = (A- pl ri (A — p;1)~' + other terms. 


Hence oj, (Xij) > 00 as |rjj| > oo. It is well known that o,,(F(N)) = om(Xi;). 
Consequently, 0,,(/(N)) > oo as |rjj| > oo. This shows that o,,(4(N)) > oo as 
\|N |l2 > oo. Next, we have 


1 1 
Om(A—N)-)  Om(A-! + FIN)" 


FW) _ Omn—m+1(A N) = 
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Now using the fact that |o,(A7! + F(N)) — om(F(N))| < |A7!Il2, we have 


Om(A! + F(N)) > Om(F(N)) — LAT ly — 00 a8 ||N lz > 00 


because O;,(F(N)) > oo as ||N||2 > oo. Hence f(N) = 
(F(N)) Ie i) == ana 
as ||N ||. — oo. This proves the result when A is nonsingular. 


Now suppose that A is singular. For any « > 0, we choose A, := Aj ®--:®@ Am 
such that A, is nonsingular and 


|A — Agll2 = max{||A — pj, — Ajll2: j = 1: m} <« €/2. 


Then as above we have Oyn—m+1(Ae — N) —> 0as ||N'||2 — oo. Hence there exists 
M > Osuch that Oyn—m41(Ae — N) < €/2 for all ||N|l2 > M. Now 


|Omn—m+1(A =. N) — Omn—m+1(Ae a N)| < |A = Aell2 < €/2 


shows that FN) = Omn—m+i(A ~ N) < Omn—m+1 (Ae ~~ N) oF €/2 <e for all 
||N |l2 = M. This proves that f(N) > 0 as ||N||2 > oo. 

Finally, since f (NV) is Lipschitz continuous and f(N) — Oas ||N||2 — 00, there 
is a closed bounded (compact) subset S C UN(m, C) such that 


sup{ f(V) : N € UN(m, C)} = sup{ f(V) : N € S} = max{f(N): N € S} 


f (Nopt) 


for some Non € S as S is compact. This completes the proof. 


Consider SN(m, C) := {N € UN(m,C): NGj, 7 +1 40, j= 1:m—1}. The 
set SN(m, C) consists of special upper triangular nilpotent matrices in the sense that 
N € SN(m, C) = > rank(N) = m — | and that SN(m, C) is dense in UN(m, C). 
Since f is Lipschitz continuous, we have 


sup{ f (NV): N ¢ SN(m, C)} = max{ f(N) : N e UN(m, ©)}. (13.19) 


We need the following result which follows from max-min characterization of sin- 

gular values. For completeness, we provide a proof. 

Lemma 13.2 Let A € C"*” and o\(A) => +--+ > 0,(A) be singular values of A, 

where p := min(m,n). Let V € C”*" be nonsingular and U € C”*". Then 
a(UAV) < ||UllallVIl2 oj(A), f= 1: p. 

Proof By Horn and Johnson [11, Theorem 7.3.8, p.451], fork = 1: p, we have 


|| Axll2 


o,x(A) = max min ; 
S:dim(S)=k, xeS\{0} || X||2 
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where S C C” is a subspace. Now, 


UAV xl2 | AVxll2 |Vxllo | Ayll2 
= ea Oe < lUllallV Ila ' 
Il Ilo |Vxll2_ llxll2 IIyll2 


where y := Vx. Since V is nonsingular, we have 0,(U AV) < ||U|l2||V lz o% (A) for 
k=1:p. 


Recall that @ is the number of distinct eigenvalues of D and eig(D) = {A1,..., Ac}. 


Theorem 13.12 Either f(N) = on(A, D+ N) =0 on UN(m, C) or f(N) #0 
for all N € SN(m, C). In particular, if m = € then either f (N) = 0 on UN(m, C) 
or f(N) #0 forall N € UN(m, C). 


Proof First, note that if N € SN(m, C) then rank(D + N — Al,) = m — | for all 
A € C. This shows thatgm(\;, D+ N)=1, j =1: ¢, forall N ¢ SN(m, C).Con- 
sequently, there exists unique natural numbers m, > --- > mg such that 


f= Jin (A) @::-@ Jn, (Xe) 


is the Jordan canonical form of D + N for all N ¢ SN(m, C), where Jin, (A;) is the 
Jordan block of size m; x m,; with eigenvalue A; for j = 1: ¢. Thus D+ N~ J 
for all N € SN(m, C). 

Now, suppose that f(No) = 0 for some No € SN(m, C). Let N €¢ SN(m, C). 
Since D+ N ~ J ~ D+ No, there is a nonsingular matrix S such that D+ N = 
S-!(D + No)S. Then we have 


In ® A—(D+N)' @1, = In ® A—(S"'(D+ No)S)' @ I, 
= (S' @I,)Um ®@ A—(D+No)' @hh)(S' @1,)7'. 


By Lemma 13.2, we have 


f(N) = Gmn—m4i((S" @ In)Um ® A-— (D+ No)! Q T,)(S! ® [aes 
< |S" lollSll2 f(No) = 0. 


This shows that f(N) = 0 for all N € SN(m, C). Consequently, f(N) = 0 for 
all N € UN(m, C) as f(N) is Lipschitz continuous and SN(m,C) is dense in 
UN(m, C). On the other hand, if f(No) 4 0 then we have f(No) < ||S~'[l2I| Slo 
FCN) showing that f(NV) 4 0 for all N ¢ SN(m, C). This proves the first assertion. 

Finally, if m = € then the diagonal entries of D are distinct. Thus D+ N is 
diagonalizable for all N € UN(m, C). Hence either f(N) = Oforall N € UN(m, C) 
or f(N) £0 for all N € UN(m, C). This completes the proof. 


The proof of Theorem 13.12 shows that the following result holds. 


Corollary 13.3 Let S ¢C”*" be nonsingular and let cond(S) := ||S~'|2||S|l2. 
Then we have 
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Om(A, B) 


< Om(A, S7'BS) < d m(A, B). 
ends) <on(A,S S) < cond(S)om ( ) 


We denote the Jordan canonical form of a matrix X € C”*” by JCF(X). Let 
JCF(m, A) denote the set of all m x m matrices in Jordan canonical form with 
spectrum A. Thus 


JCF(m, A) := {JCF(X) : X € C”*”, eig(X) = A}. 
Let Diag(m, A) denote the set of m x m diagonal matrices with spectrum A, that is, 
Diag(m, A) := {D € C”*” : D is diagonal and eig(D) = A}. 


Then Diag(m, A) 6 UN(m, C) consists of all m x m upper triangular matrices with 
spectrum A and 


JCF(m, A) = JCF (Diag(m, A) 6 UN(m, C)). 


On the other hand, Diag(m, A) @ SN(m, C) consists of m x m nonderogatory upper 
triangular matrices with spectrum A. Indeed, if D+ N € Diag(m, A) ® SN(m, C) 
then it is easy to see that rank(D + N — XI,,) =m — 1 for \ € C. Consequently, we 
have gm(A;, D+ N) = 1 for j = 1: €. Hence there exist unique natural numbers 
m, > +++ > me such that 


JCF(D + N) = Jin, At) B+ + ® In (Av), 

where Jn; (A;) is the Jordan block of size m; x mj; with eigenvalue A; for j = 1: €. 
This shows that D + N is nonderogatory. In fact, gm(A;, D+ N) = 1 for all j = 
1: € and for all N € SN(m, C). Hence we have 

JCF(D + N) = Jn, 1) 8 +++ ® Im, (Ae) for all N ¢ SN(m, C). (13.20) 
We have seen that d(A, B) = d(A, JCF(B)). Suppose that B is nonderogatory. 
Then there exists D € Diag(m, A) such that JCF(B) = JCF(D + N) for all N € 
SN(m, C). Consequently, we have 

d(A, B) = d(A, JCF(D + N)) = d(A, D+ N) = om(A, D+ N) 

for all N € SN(m, C). This shows that 

d(A, B) => sup{ f(V) : N € SN(m, C)} = max{ f(V) : N €¢ UN(m, ©)}. 


Thus we have the following result. See Mengi [23], Kressner et al. [18], Gracia [10], 
Malyshev [22], Lippert [20] for similar results. 
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Theorem 13.13 Let B € C”*” be nonderogatory and upper triangular such that 
eig(B) = A. Set D := diag(B) and consider f (N) := om(A, D+ N). Then 


d(A, B) => sup{ f(V) : N € SN(m, C)} = max{ f(V) : N € UN(m, C)}. 


Suppose that the maximum is attained at No and that f (No) 4 0. Also suppose that 
there exist left and right singular vectors U and V of T4,p corresponding to f (No) 
such that U*U = V*V and rank(V) = m. Define AA := — f (No)UV™. Then 


(A + AA)V = V(D+No) and ||AAll2 = f(No) = d(A, D + No). 


Ifm = £ then d(A, B) = f (No) = Om(A, D+ No). Ifm > £ and No € SN(m, C) 
then d(A, B) = f(No) = Om(A, D+ No). 


Proof By Theorem 13.6, we have (A+ AA)V =V(D+WNo) and ||AA|l2 = 
Ff (No) = d(A, D+ No). Now, if m = @, then B is an £ x € matrix with ¢ dis- 
tinct eigenvalues. Hence B ~ D + No. Consequently, d(A, B) = d(A, D+ No) = 
Ff (No) = om(A, D + No). On the other hand, if No € SN(m, C) when m > £, then 
again B ~ D+ Noas D+ No isnonderogatory. Hence d(A, B) = d(A, D+ No) = 
F (No) = Om(A, D + No). This completes the proof. 


Remark 13.3. Theorem 13.13 assumes that either m = £ or No € SN(m, C). Now 
suppose that No € UN(m, C) but not in SN(m, C). Then No(j, j + 1) = 0 for some 
j. For any € > 0, replacing the zero entries in No(j, j + 1) with €, we have N, € 
SN(m, C) such that ||No — N-|l2 = €. Set B, := D+ N,. Now assuming that T, 3 
has left and right singular vectors U and V corresponding to f (NV) such that U*U = 
V*V and rank(V) = m, we have d(A, B.) = f(N_). Then 


|d(A, B) — d(A, B.)| = |f (No) — F(Ne)| S IINo — Nello = €. 


This shows that if No ¢ SN(m, C) then d(A, B) can be approximated arbitrarily 
closely by d(A, B,) provided the conditions on the singular vectors are satisfied for 


f(Ne)- 


The main question now is to determine whether or not the conditions on the singular 
vectors in Theorem 13.13 are satisfied at the point of maximum Np. It is proved in 
Malyshev [22] that these conditions are satisfied when £ = 1 and m = 2. In that case 


69(A, A) = max{ f(N) : N € UN(2, C)} = f(No) = 92(A, AL + No). 


00 
in Gracia [10] that the conditions on the singular vectors are satisfied and that 


Note that No = k 2 for some fo € C. For the case when m = ¢ = 2, it is shown 


d2(A, A) = max{ f(N) : N ¢ UN(2, C)} = f(No) = o2(A, D + No) 
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provided that No 4 0, that is, No € SN(2, C). We provide a unified simple proof of 
these results using subdifferential of {(No). We show the condition U*U = V*V 
is satisfied for m > 2 when the singular value f (No) is simple. However, the rank 
condition (rank(V) = m) may or may not be satisfied. If the rank condition is not sat- 
isfied then we have an analogue of Corollary 13.2 which ensures that some elements 
in A are eigenvalues of A + AA with || AAll2 = f (No). 

Define F : UN(m,C) — C™™"" by F(N) := In @ A—(D+N)' @h. 
Then recall that f(N) = o(A, D+ N) = Omn—m4i(F (Y)). We now show that if 
Ff (No) is a simple singular value then the associated left and right singular vectors 
U and V of T,4 p+w Satisfy the condition U*U = V*V. In fact, in such a case U*V 
is upper triangular. 


Theorem 13.14 Suppose that f(No) = max{f(N):N € UN(m,C)} and that 
F(No) #0. Suppose that the multiplicity of f(No) is r. Let Ue C"*" and 
VEC" beisometry such that F(No)V = f (No)U and (F (No))*U = f(No)V. 
Then we have 


0€F (U*(E' @1,)V) forall E € UN(m, C). 


For each E € UN(m,C), there exist left and right singular vectors U and V of 
Ta4,p+N, corresponding to f (No) such that Tr(U*V E) = 0. In particular, ifm = 2 
then U*V is upper triangular and U*U = V*V. 

Now, suppose that f (No) is simple. Let U and V be left and right singular vectors 
of Ta,p+n, corresponding to f (No). Then U*V is upper triangular and we have 
UU =V*V. 


Proof Note that for any E € UN(m, C), the function gr(t) := f(No + t£) attains 
amaximum att = 0. Hencet = Oisa Clarke stationary point of g¢(t). Consequently, 
0 € Ogg (0) = Og f (No). By Theorem 13.11, we have Og f (No) = F (Vi(ET @ I,)* 
U) . Hence we have 


06 F (V*(E" @1,)*U) for all E € UN(m, C). (13.21) 


Equivalently, we have 0 € F (u* (E'®@ 1,)V) for all E € UN(m, C). Thus there 
exists x € C’ such that x*U*(E' @ 1,)Vx = 0. Let V € C”*” and U € C”*" be 
such that Vec(V) = Vx and Vec(U) = Ux. Then we have 


Tr(U*VE) = x*U*(E! @ I,)Vx = 0. 


Now, if m= 2, then for E = €1€3, we have e, U*Ve, = Tr(U*VE) = 0. This 
shows that U*V is upper triangular. Since D+ No is upper triangular, by Theo- 
rem 13.8, we have U*U = V*V. This proves the results when f (No) is multiple. 
Now consider the case when f(No) is simple. Then U/ and VV are vectors in 
Cc”. Consequently, F (U*(E' ® 1,)V) = {U*(E' @ 1,)V}. Hence by (13.21), 
we have U"(E' @ 1,)V = 0 for all E € UN(m, C). Let U € C”*” and V € C”*™ 
be such that Vec(U) = U and Vec(V) = V. Then Tr(U*VE) = U*(E! @1,)V = 
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0 forall E € UN(m, C). In other words, (U*V, E*) = Oforall E € UN(m, C). Thus 
U*V is orthogonal to all strictly lower triangular matrices in C”™” and hence U*V 
is upper triangular. Since D + No is upper triangular, by Theorem 13.8, we have 
U*U = V*V. This completes the proof. 


Remark 13.4 If f (No) is multiple then Theorem 13.14 does not guarantee existence 
of left and right singular vectors U and V of Ty pn, such that U*U = V*V. The 
existence is guaranteed if there exists x € C” such that 


O=x*U*(E' @ 1,)Vx € F (U*(E' @1,)V) 


for all E € UN(m, C). In sucha case, we choose U € C”*” and V € C”*” such that 
Vec(U) = Ux and Vec(V) = Vx. Then Tr(U*VE) = x*U*(E! @ I,)Vx = 0 for 
all E € UN(m, C) showing that U*V is upper triangular. Since D + No is upper 
triangular, by Theorem 13.8, we have U*U = V*V. Equivalently, if there exists 
x € C’ such that 


O=x*U*(eje) @1,)Vx € F (U*(eje; @L,)V), i=1:m—1, j=2:m, 


then U € C"*” and V € C”*” such that Vec(U) = Ux and Vec(V) = Vx satisfy 
the condition U*U = V*V. We have seen that such a vector x € C” exists when 
m = 2. So, does there exist such a vector x € C” when m > 2? For m > 2, an 
affirmative answer can be obtained if the joint-field of values (i.e. the joint numerical 
range) of U*(eje,) @1L)V,i=1:m—1, j =2:m, contains 0. 


We have seen that there exist left and right singular vectors U and V of T, pin, 
corresponding to f(No) such that U*U = V*V when m = 2 irrespective of the 
multiplicity of the singular value f(No) = 0,(A, D+ No). We now show that the 
condition rank(V) = 2 holds when m = 2 irrespective of the multiplicity of f (No) 
provided that No 4 0. Compare our proof with those in Malyshev [22], Gracia [10], 
Lippert [20]. 


Theorem 13.15 Let f (No) = max{o,,(A, D+ N): N € UN(m, C)}. Suppose that 
FS (No) 4 0. [fm = 2 and No # 0 then there exist left and right singular vectors U 
and V of Ta,p+N,. corresponding to f (No) such that U*U = V*V andrank(V) = 2. 


Ot 
00 
and right singular vectors U and V of T,4,p+n, corresponding to f(No) such that 
U*U = V*V. 

If possible, suppose that rank(V) < 2. Note that ker(V) = ker(U). Since 
Tr(V*V) = 1, we have dim ker(V) = 1. Let x € ker(V) be nonzero. Then AV — 
V(D+ No) = f(No)U => V(D+ No)x = 0 => (D+ No)x € ker(V). Hence x 
is a eigenvector of D+ No. Similarly, A*U — U(D+ No)* = f(No)V => 
U(D + No)*x = 0 => (D+ No)*x € ker(U) = ker(V). Hence x is an eigenvector 
of (D + No)*. This shows that x is acommon eigenvector of D + No and (D + No)*. 


Proof Note that UN(2, C) = ite c|. By Theorem 13.14 there exist left 
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Note that No = : : for some fo € C and to 4 0. Hence No € SN(2, C). Now 


consider the case when ¢ = 2. Then D = diag(\;, Az) and A; # Az. Thus x is a 
common eigenvector of 


D+N = i i and (D + No)* = E ac 


Since to 4 0, it follows that x = 0, which is a contradiction. Hence rank(V) = 2. 
On the other hand, if € = 1 then A; = Az = \ and x is acommon eigenvector of 


p+m=|95| and (D+ Noy" = [78]. 


Since tg € 0, it follows that x = 0, which is a contradiction. Hence rank(V) = 2. 


Remark 13.5 We mention that if € = m = 2 then it is important that f(N) attains 
maximum at No 4 0, thatis, No € SN(2, C). Insuchacase D + No is nonderogatory 
and hence we have 


do(A, A) = d(A, D) = d(A, D + No) = f (No) = 92(A, No). 


On the other hand, if 2 = 1 andm = 2 then it has been proved that 62(A, A) = f (No) 
even when the maximum is attained at No = 0. In such a case, a rank-1| perturbation 
AA can be constructed such that A is a multiple eigenvalue of A + AA and ||AA||. = 
FS (No) = 62(,, A); see Malyshev [22]. 


Remark 13.6 Recall that d(A, B) = d(A, S~'BS) for any nonsingular matrix S 
and 


dbm(A, A) = inf{d(A, B): B e€ C”*”, eig(B) = A} 
= inf{d(A, J) : J € JCF(m, A)} 
= min{d(A, J): J € JCF(m, A)}. 


Hence, in view of (13.20), if m = @ then for any D € Diag(m, A), we have 
bm(A, A) = d(A, D) = max{f(N): N € UN(m, C)} = f(No) 


and the equality holds when f(No) is simple and rank(V) = m. The equality also 
holds when U*U = V*V and rank(V) = m. Here U and V is are left and right 
singular vectors of T4,p+n, corresponding to f(No). On the other hand, if £ < m 
then D + N will have different JCF for different choices of D € Diag(m, C). Then, 
in view of Theorem 13.13 and Remark 13.3, 6,,(A, A) can be approximated arbitrary 
closely by taking minimum of d(A, J) over all nonderogatory J €¢ JCF(m, A). 
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Chapter 14 Mm) 
Equality of BLUEs for Full, Small, rie 
and Intermediate Linear Models Under 
Covariance Change, with Links to Data 
Confidentiality and Encryption 


Stephen J. Haslett and Simo Puntanen 


Mathematics Subject Classification (2010) 62J05 - 62J10 


14.1 Introduction 


Generally, when the error covariance structure in a linear model is changed, the 
BLUEs of estimable functions of its parameters also change. Rao [18] explored the 
necessary and sufficient conditions that the change in error covariance results in no 
change to the BLUEs. 

Rao’s conditions for allowable changes to a shared error covariance while still 
retaining BLUEs may be applied separately to two models, one a submodel (the 
“small model’) of the other (the “full model’’). The necessary and sufficient condition 
for BLUEs for the full model applies to all estimable functions of its parameters, not 
just to estimable functions containing parameters only from the small model. BLUEs 
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of estimable functions of parameters that are in both models are not necessarily equal 
in the full and small models, but for each model its own BLUEs remain unaltered. 

We show that if both model and submodel have the same original covariance 
structure, the permissible changes to the error covariance for retention of BLUEs of 
all estimable functions of all parameters in both models are similar in block diagonal 
structure of the added quadratic form, but are nevertheless different to those for the 
two models considered separately. 

Calling the original model the “full model” and the submodel the “small model” 
allows explicit specification of any “intermediate models” between them, i.e., all 
models containing the same parameters as the small model plus some but not all the 
parameters in the full model that are not in the small model. It also permits discussion 
of restrictions on changes to error covariance so that, in addition to the full and small 
models, all intermediate models also retain their own BLUEs. The results developed 
apply regardless of the order in which regressors in a reparametrised version of the 
full model that are not in the small model (including any possible interaction terms) 
are added or removed. 

Haslett et al. [9] considered properties of BLUEs in full versus small linear models 
with new observations where the supplementary quadratic form added to the origi- 
nal covariance structure implicitly involves block diagonal matrices based only on 
functions of the design matrices of the linear models. 

Haslett et al. [10] have outlined in detail the permitted error covariance changes 
that retain BLUEs of estimable functions of parameters for both full and small fixed 
effect linear models, but the paper does not consider intermediate models or retention 
of the covariance of the BLUEs. 

The key questions addressed in this paper are 


1. When two nested fixed effect linear models (the “full” and the “small” models) 
are each to retain the equality of BLUEs for their own estimable functions of their 
own parameters, how are the changes in shared error covariance matrix that are 
permissible linked to block diagonal matrices? 

2. What additional restrictions on shared error covariance matrix are needed when 
two nested fixed effect linear models (the “full” and the “small” model) each must 
retain the equality of BLUEs as well as the covariance for estimable functions of 
their own parameters? 

3. What further limitations are placed on changes to the original shared error covari- 
ance matrix if BLUEs of estimable parameters are retained for all intermediate 
models as well as for the full and small models, and what further restrictions 
would allow the covariance of these BLUEs also to remain unchanged? 


Having established the necessary core results, the way they are related to data con- 
fidentiality and encryption is outlined. For data confidentiality, it is shown that given 
a pre-specified full model that may include interactions of any order, it is possible to 
construct any number of datasets that have exactly the same BLUEs and covariances 
of BLUEs for the estimable function of parameters as the original dataset, for all pos- 
sible submodels. These methods extend the results of Haslett and Govindaraju [7, 
8] on confidentiality and encryption and simplify their implementation. The results 
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have roots in the data cloning literature, in the wider sense of non-identical datasets 
that produce identical statistics. Also see, for example, Anscombe [1], Chatterjee 
and Firat [2] and Govindaraju and Haslett [5]. Replications of the original data are 
another very simple form of cloning which, via Markov Chain Monte Carlo methods, 
can be used to obtain maximum likelihood estimators of the parameters in a range 
of models from linear models to generalised linear mixed models (Lele et al. [13]). 


Example 1 
Let 


Y = (159 23 36)’ 
Ynew = (1.132843 5.3215964 10.559413 24.992382 34.103914)’ . 


111 1.0298573 1.0298573 1.0298573 
12 4 1.0565343 2.2798898 4.7266007 
X= ]139 and Xnew = | 0.9799919 3.0624747 9.6284206 
14 16 0.9825015 4.1142250 17.377995 
15 25 0.9473309 4.7366547 23.683273 


The BLUE for the original data is 


For the same data, the BLUE for the submodel with the last column of X deleted 


is 
A —11.6 
= ( 8.8 ) , 
and the BLUE using only the first column of X is B = 148. 


Using Ynew and X;,ey produces exactly the same set of BLUEs, with exactly the 
same covariances for these BLUEs as for the original dataset, Y and X. 


Why and how? We return to Example | in Sect. 14.7. 


14.2 Full Model M2 and Small Model My, 


Consider the two nested linear models, Mj2 (the full model (1)) and M, (the small 
model (2)), both with fixed parameters: 
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Y = XB, + Xo2Bo2 + e 
= Xofo +e (14.1) 
and Y=Xif, +e (14.2) 


where cov(e) = Vo is non-negative definite. The subscript “ 0” is used to des- 
ignate this “original” model in (14.1) because we later find it useful to consider 
reparametrised versions of Mj». 

The vector Y is an n-dimensional observable response variable, and e is an unob- 
servable random error with a known covariance matrix cov(e) = Vo = cov(Y) and 
expectation E(e) = 0, where 0 is the zero vector. The matrices X, whichisn x p; and 
X02 whichisn X po are known with p; + por = po. So Xo isn X po where po <n, 
sometimes (though not necessarily) partitioned as Xo = (X, : X02). Neither X, nor 
Xo2 are necessarily full (column) rank. The parameter vector is By = (B} : Bj)’ 
where the symbol ’ indicates transpose. Let 4. = Xo Bo. 

Let the symbols A~, At, C(A), C(A), denote a generalised inverse, the (unique) 
Moore-Penrose inverse, the column space, and the orthogonal complement of the 
column space of the matrix A. We denote any matrix satisfying C(A__) = C(A)_ as 
A,. Let Py = Peay = AAt = A(A‘A) A’ denote the orthogonal projector (with 
respect to the standard inner product) onto C(A). The orthogonal projector onto 
C(A)_ is designated as Q4 = I, — P,4 where J, is the a x a identity matrix with 
rows and columns. When A = X = (X,: X2), set M = I, — Px, Mj = I, — Py, 
and Mz = I, — Px,. An obvious choice for X, is M. 

Following Rao [18, Theorems 5.2 and 5.3], irrespective of the rank of the necessary 
and sufficient condition on a covariance matrix for a linear model 


Y= xXB+e (14.3) 
with cov(e) = Vo to continue to have the same BLUE for Xf is that 
V = V+ XAX’ + VoX, BX', Vo. (14.4) 


Note that X and X_ are orthogonal, that A and B are arbitrary, conformable, and 
hence square matrices of a possibly different dimension, and that (although it will 
be true when Vo is of full rank) there is no requirement that C(V)X _) UC(X) = R’, 
i.e., it is not necessary that C(Vo,) = {0} (where C(Vo , ) denotes the column space 
orthogonal to Vo), or that the model is weakly singular, i.e., that C(X) C C(Vo). Rao 
[18] specified tighter conditions, but Mitra and Moore [15, p. 148] and Rao [19, p. 
289] both noted that the proof in Rao [18] did not require that 


C(X : Vo) =C(X : VM) =R"; 
however if this additional condition holds, then BLUE(/z) is unique. 


Although both are square, the requirement that A or B in (14.4) be symmetric can 
be loosened. If a quadratic form C EC’ is itself symmetric, then by taking transposes 
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CEC' = CE'C' and hence CEC’ = c{H/ 1c", Because err) is symmetric, 
and because both V and Vo are symmetric then, unless XAX' + VoX, BX’ Vo is 
symmetric but its two components quadratic forms are not, A and B in Eq. (14.4) 
can be set to be symmetric without loss of generality. 

For A and B symmetric, set A = UU’ and B = U,U;, so that Eq. (14.4) yields 


V = Vo+ XUU'X' + VoX_U,U,X', Vo. (14.5) 


14.3 Reparametrised Full Model Mj; and Small Model 
My 


Recall that M; = I — Py,, where Py, = X,(X}X1)~ X‘ is the orthogonal projector 
onto C(X,). Note that Py, is invariant to the choice of generalised inverse (X{X1)~. 
Let Xo) = Px, X02 = M, X02. Then Px, = PM, Xo =— M1 X02(Xo.M1 X02) XM 
is the orthogonal projector onto C(X,) the column space perpendicular to C(X,). 
Note that Px,,, is invariant to the choice of (X, 02/1 X02). The transformation of X02 
to X); ensures that X; and X); are orthogonal. 

Noting that generally 6 4 §o2 define a reparametrised full model Mj, by 


Y = Xfi + Px,, Xo2bo + € 
= XB, + M|Xo2f2 + e 


(14.6) 
= X1 PB, + X21 f2 +e 
= X;B; +e 
again with cov(e) = Vo. 
For the BLUEs under M2, because C(X;) = C(X, : M, X02) = C(Xo), 
BLUE(u|Mi2) = BLU E(u|Mi2,), (14.7) 
so that without loss of generality we may set 
X11 B2 = X02Bor- (14.8) 


If required Boz (or its estimate) can be recovered from > (or its estimate) via 


Bor = oaXoa) XoXo Bo (14.9) 
= (X—. X02) Xq.M1 X02H2 
and an appropriate choice of generalised inverse. 
The reason for using the reparametrised or transformed model Mj», is that the 
full model in its original form, Mj2, does not provide any straightforward way to 
specify the condition that the sets {V} for Mj2 and M, coincide, because it is not 
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Must be zero to meet conditions of Rao (1971) theorems 5.2 and 5.3 
Can be non-zero 


Ee Can be non-zero 


Fig. 14.1 Matrices of form Uo Ub (n x n) which when added to Vo are necessary and sufficient 
for unchanged BLUEs for reparametrised full model Mj, and for small model M,, considered 
separately, under the assumption that C(X2\1) C C(Vo X11) 


generally possible to distinguish terms that contain both X,; and X2 from the quadratic 
forms in only X, or only in X2. This in turn leads to complications ensuring that 
in M,, C(VoX 121) A C(X2) does not overlap with part of C(X,). Note that, unless 
C(X1) N C(X2) = {0}, C(X2 : VoX121) NC(X1) F {0}. 

For Mj2;, Eq. (14.5) can be written 


V=Vot+ X,U,U;X' + X91 9103), X91 
+X UU, X51 + X91U21U;X} (14.10) 
+VoX 121 U2 U jz, X42, Vo. 


For M, the parallel result is that 
V=VotX UU, X) + VoXi1U1,U;,X} | Vo. (14.11) 


Both (14.10) and (14.11) can be written in block diagonal form so that all their 
quadratic forms can be contained in a square matrix of dimension n x n that we 
denote by Up Uj. However, the internal structure of these n x n block diagonal matri- 
ces is different for full and small models, as illustrated in Fig. 14.1. 

Note that in Figs. 14.1, 14.2, 14.3, 14.4, and 14.5, although the block matrix of 
principal interest UpU{ is in usual matrix notation, the collection of row labels and 
column labels is not. To return the pictorial representation to usual matrix notation, 
row labels (which form a column in the figures) should be read in sequence as a row 
vector which pre-multiplies UpU,, and the column labels (which form a row in the 
figures) should be read in sequence as a column vector which post-multiplies Up Uj. 
Here, VoX perp = VoX 121, and VoX tperp = VoX1,_. When a matrix product involving 
X1, VoX11, X21 and/or VoX 12, is zero, this is represented in the figures by setting 
the relevant part of UpU{ to zero. 
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Win UW, 


Must be zero to meet conditions of Rao (1971) theorems 5.2 and 5.3 
Can be and are set to zero 

Can be non-zero 

Can be non-zero 


KEY: 
i 


Fig. 14.2 Matrices of form UU (n x n) necessary and sufficient for unchanged BLUEs for 
reparametrised full model My2; and for small model My, considered simultaneously, under the 
assumption that C(X21) C C(Vo X11) 


UW 20 UM, 


X's X' 211g X'pemVo 


KEY: 


Must be zero to meet conditions of Rao (1971) theorems 5.2 and5.3 
Can be and are set to zero 

Scalar and can be nonzero 

Can be non-zero 

Can be non-zero 


Fig. 14.3 Matrices of form UoU{ (n x n) necessary and sufficient for unchanged BLUEs for the 
reparametrised full model Mj2;q and for the small model Mj, considered simultaneously, where 
X2\1q = X2\1q a vector, under the assumption that C(X21) C C(Vo X11) 


Note that although in general C(X1) U C(X9)1) UC(Vo X12.) UC(VoL) = R" and 
C(X;) UC(VoX1,) UC(Vo,) = R”, where C(Vo,) is the column space orthogo- 
nal to C(Vo), Fig. 14.1 does not include any terms in C(Vo,). This is not because 
C(Vo,.) = {0} necessarily, but because via Rao [18, Theorems 5.2 and 5.3] all terms 
(if any) in (14.10) and (14.11) that involve Vo, must be zero. The assumption inher- 
ent in Fig. 14.1 is that C(X1) C C(VoX1_) is not necessary. If it does not hold, 
then it is because C(Vo,) A {O} and C(Xy;) NC(Vo_) ¥ {0}, so that some part 
of C(Xa1) Z C(VoX1,). Then, when considering conditions on V for retention 
of BLUEs for M,; and Mj, simultaneously, U; U5), must be zero to match the 
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UW 24 UW, 


Must be zero to meet conditions of Rao (1971) theorems 5.2 and 5.3 
Can be and are set to zero 

Can be nonzero but must be diagonal 

Can be non-zero 

Can be non-zero 


Fig. 14.4 Matrices of form UgUj (n x n) necessary and sufficient for unchanged BLUEs for 
reparametrised full model Mj2;, and for small model MM, considered simultaneously, where X2)1q 
is a matrix, under the assumption that C(X21) C C(Vo X11) 


Can be non-zero 


Must be zero to meet conditions of Rao (1971) theorems 5.2 and 5.3 
Set to zero 


Fig. 14.5 Matrices of form UoU5 (n x n) sufficient for unchanged BLUEs and covariances of 
functions of estimable parameters for original full model M)2 and for all its submodels including 
the small model Mj, considered simultaneously 


result of Rao’s [18] theorems. For any intermediate models, exactly which parts of 
UoU, must be zero is difficult to specify without orthogonalising the columns of 
X21. This orthogonalisation is discussed in Sect. 14.4 and comment made there 
about removing the assumption that C(X2);) C C(VoX1_), but in the remainder 
of Sect. 14.3 and the initial part of Sect. 14.4, to simplify the exposition, Vo is 
assumed weakly singular (i.e., C(X) C C(Vo)). Although weak singularity is a 
more familiar condition, it is stronger than C(X21) C C(VoX1); however when 
in the small model MM, the design matrix X, has only one column (a situation 
important to data confidentiality and encryption); the two conditions become very 
similar. As an aside, the condition Rao [18, Theorem 5.3] assumed unnecessarily, 
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namely C(X : Vo) = C(X : VoM) = R’ is not implied by weak singularity, i.e., by 
C(X) C C(Vo), because although under weak singularity C(X : Vo) = C(Vo), it is 
not necessarily then true that C(X : Vo) = R" (ie., that Vo is full rank). 

Illustrated (by differing blocks of UjU) needing to be zero) in Fig. 14.1 is that the 
required V for Rao’s [18] condition for M2, (and hence for Mj2) and for M, to 
each have the same BLUEs are not equivalent, even for weakly singular Vo. Addi- 
tional restrictions need to be placed on the structure of V for both Mj2, and M,, 
so that the condition that the BLUEs be unchanged applies to both models simulta- 
neously. Put more formally, the subspaces in Eq. (14.10), namely C(X1), C(X2)1), 
and C(VoX 121), and in Eq. (14.11) namely C(X,) and C(VpX1_), are not the same 
partition of R’\C(Vo_) (.e., of R” excluding C(Vo_ )). 

Essentially, our representational convention for Fig. 14.1 suggests that what is 
required in specifying V for which BLUEs are retained in both M, and Mj, when 
Vo is weakly singular is that all submatrices of the entire matrix UpU{ that need to 
be zero in either model when they are considered separately, need to be zero in both 
models when they are considered together. 

Without further loss of generality (based on C(VoX 11) and C(X2)1) U C(VoX 121) 
having exactly the same column space, since both only exclude C(X,) from 
R”"\C(Vo_)), specify the orthogonal basis for Vy) X 1, as being subdivided into two 
parts, one part in C(X2);) and the other part in C(Vo X12). 

For Mj2;, now consider a restricted form of Eq. (14.10) 


V = Vot XU U, X) + X21 219) X51) + VoX 121 Ui2rUj2, X12) Vo. (14.12) 


Allterms in U,U},,, U2),U{,, and their transposes must be zero in Eq. (14.12) because 
X10, V(VoX121)1 = 0 via Rao [18, Theorem 5.2]. The term in U; U5), and its trans- 
pose that appears in Eq. (14.10) are now zero by construction in Eq. (14.12). 

For M,, the parallel result is that 


V=Vot+ X1UU;X) + VoX1LUi-U;,X4 | Vo (14.13) 
where Vo X1|Uj,-U;,X‘ , Vo is restricted so that 
VoX 14 U1, Uj, X44 Vo = X21 21U 3), Xo), + VoX 121 U 12 Ujn, X12, Vo (14.14) 
1.€., 
V=H=V+tx UY, UX + X91 U1 U3), X91 + VoX 121 Uj2,U}y,X 191 Vo (14.15) 
which is exactly Eq. (14.12). 
The structure for V in Eqs. (14.12) and (14.15) is necessary and sufficient via this 
extension of [18] because no other structure can ensure that the conditions required 
for M2; and M, to retain their BLUEs simultaneously are met. 


So in Fig. 14.2, only the diagonal blocks of UpUj can be non-zero when specifying 
V such that Rao’s condition applies to Mj, and M, simultaneously. 
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14.4 Models Intermediate Between Mj2; and My 


The question may be extended to considering the requirements on {V} for Mj2,, Mj, 
and all models intermediate between them (interpretable as parameters being added 
one at a time to X;), each to have the same BLUEs. 

However, there is a complication. 

Recall that X2); was constructed to ensure that X); was orthogonal to X, in order 
to apply Rao [18, Theorem 5.3]. In the same manner, when considering intermediate 
models where we need to reassign the column space in sequence, we need to use a 
transformed version of X); in which each column is orthogonal. 

This can be achieved, for example, via the QR decomposition of X41, 


Xa = Xa 1q Roig (14.16) 


where X31, 18m X pz with orthogonal columns and Ro);, is anupper p2 X p> triangu- 
lar matrix. Note that rank(X 914 R217) = rank(X2)). Then defining Br, = Ro14 Po, 
it will be the columns in X3);, along with the corresponding parameters in Bz, that 
can be deleted one at a time, rather than the columns of X2);. This new model is 
designated Mj2;,. Again by a parallel argument to (14.7) 


BLUE(u|Mi2) = BLUE(u|Mi2;) = BLUE(u|Mi21q)- (14.17) 


Because they are orthogonal, the columns of X );, can be added or removed in 
any order so that all intermediate models are included not just those in a particular 
addition or deletion sequence. There is also no loss of applicability of the result, 
at least algebraically, if there are interaction terms within Xo in the original model 
My. 

Consider first the case where X )1, = X2\1q a vector, i.e., zg is scalar, so only 
one additional parameter and one additional column from X91, is added to M,. See 
Fig. 14.3. 

Because V must still hold for both the full and the small model, the new require- 
ment on V is a special case of Eqs. (14.12) and (14.15) with Xai1g = x2)14: 


V = Vo + XU U;X} + Xy1qU5)1g%5)1q + VoX 121 2 Un, X19, Vo. (14.18) 


The scalar term Wi14 in (14.18) is represented in UpU4 of Fig. 14.3 in light blue. 
The extension to a second single parameter from f2, repeats the same requirement, 
namely the addition of a single further diagonal element. Repeating the process can 
continue to add only diagonal elements to U2), U5), a 
Hence, the condition on weakly singular V so that Mj2;g, Mj, and all models 
intermediate between them have the same BLUEs for X2)14 62, is that 


V=VWtXU,U,X,4+ Xy1qdiag(Ua\1qgU5)14) Xa\14 + VoX 121 U2, U}>, X12 Vo 
(14.19) 


14 Equality of BLUEs for Full, Small, and Intermediate Linear Models ... 283 


where diag designates a diagonal matrix. 

Note that, if X2\1gdiag(U2\1gU3),) X51, in Eq. (14.19) is written in terms of X2 
rather than X5),, then Eq. (14.19) can induce terms of the form XF X{ for some F 
containing both X; and X>. 

Because X14 has only orthogonal columns, it is now possible to be explicit on 
how the assumption that C(X2;) C C(VoX1,) can be removed from the results in 
Sects. 14.3 and 14.4 for full, small, and intermediate models. Each column vector in 
Xo)1q is either in C(VpX 1) or in C(Vo,) but cannot be in both, since C(Vo X11) N 
C(Vo1.) = {0}. So 


V=Vot+ XiUU;X, 4+ Xy1qU 211g U3 11g X2\14 + VoX 121 U2 U 49, X 4, Vo 
(14.20) 
holds for small and full models, provided that where a column of X3);, does not 
belong to C(Vo.X 1) the corresponding row and column in U);4 U5), ; contain only 
zeros, and that if intermediate models are also included then U2), U; ig is diagonal. 

The utility of the results for intermediate models is most obvious for linear models 
with polynomials as regressors and some types of experimental design. For polyno- 
mials, the required transformation can be achieved by fitting instead some class of 
orthogonal polynomials, and for many balanced experimental designs no transforma- 
tion to orthogonality is required. For models that do not have orthogonal regressors 
however, the results are more theoretical than practical. Interpretation is complicated 
because it is not the original regressors that are being removed one at a time in 
transiting via intermediate models from full model to small model or vice versa, but 
particular orthogonal linear combinations of those regressors. 

Consequently, since results are more easily interpretable in practical terms, for 
intermediate models it is more straightforward to look at the subset of {V} in (14.20) 
which avoids the need to consider the structure of all terms involving Xo2, X2)1, X2I14, 
or X, (except in so far as they determine X12, ). It also has another major advantage, 
which is discussed in the next section. 


14.5  BLUEs and Their Covariance 


The results in Sects. 14.2-14.4 concern retention of BLUEs under changes of error 
covariance. Generally, however, covariances of the BLUEs of estimable functions of 
parameters are not conserved, even for weakly singular Vo. However, it is possible 
to limit the structure of V in a way that both the BLUEs and their covariances are 
conserved. 
Consider 
V = Vo t VoX121 U 12, U jo, X40, Vo (14.21) 


with Vo and Uj2,U}5, full rank. 
Then following Puntanen et al. [17, p. 261, Problem 10.6], under Vo 
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cov(BLUE(Xofo)) = Xo(X, Vo! Xo) Xi) (14.22) 


and under V 
cov(BLUE(X0fo)) = Xo(XoV Xo) Xo. (14.23) 


However, from (14.21), via the Sherman-Morrison-Woodbury formula, attributed to 
Duncan [3] in Puntanen et al. [17, p. 301], 


V—" = (Vo + VoX 121 U2 Ujn, X49, Vo) | 
= Vo 1+ Vo 'VoX 121 {Xio VoVg | VoX 124 + UiarVin,) FX in VoVo | 
= Vo! + Xt {Xi, VoVo (VoX 1 + Via Uin,) YY Xo, 
(14.24) 
so that, because XjXi21 = 0 (since Xo and X12, are orthogonal), 


XoV~'Xo = XpVp Xo (14.25) 


and hence cov(BLUE(X09)) is identical under the change of error covariance from 
Vo to V when V is defined by (14.21). 

Puntanen et al. [17, p. 261, Problem 10.7] generalise (14.22) and (14.23) by 
replacing V, | by Vy and V~! by V~,ie., 


cov(BLUE(Xofo)) = Xo(XhVo Xo) Xh (14.26) 


and under V 
cov(BLUE(XoBo)) = Xo(XoV~ Xo) Xo- (14.27) 


However, the various available parallels to the Sherman-Morrison-Woodbury for- 
mula for generalised inverses apply only under restricted conditions. Riedel [20] is 
not directly relevant because the necessary condition on the two components of the 
sum that the rank of the sum equals the sum of the ranks is not met. Fill and Fishkind 
[4] who extended Riedel’s results noted the centrality of his rank condition even to 
their extensions. Hung and Markham [1 1] had earlier proved a more general formula, 
but this does not easily simplify in the way required here. Kala and Klaczynski [12, 
Corollary 2(e)] is useful, but still restrictive as it applies here; their result applied to 
V is that, given C(X{, Vo) C C(Ui2,Uj5,) and since C(VoX11) C C(Vo), 


Vt = Vo + Vol VoX 121 {X01 VoVo VoX 121 + irre jn,) 7} X11 VoVor 
= Vor + Vo VoX21{Xj21 VoX 121 + Ura Vj, 7} X91 VoVo 
(14.28) 
where the superscript (1) indicates a 1-generalised inverse, i.e., that KK~ K = K 
for K = {X15 VoVo' VoX 121 + Ui2U>,)*}. 
So, provided 
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XQVo' VoX 121 = (XQVo' )(VoX 121) = Xo(Voi Vo) X12 = (XOVo Vo(Vor VoX121) =0 
(14.29) 
then 


XQV* Xo = XoVo Xo + XoVG VoX 121 (X jp, VoX 121 + (Ura Uja,)T) X91 VoVo' Xo 
= X4Vo' Xo 
(14.30) 

which implies the equality of covariance of BLUEs conditions (14.26) and (14.27) 
since these results are invariant to the choices of generalised inverse. 

Now Peiy) = Vo Vo is the projection operator into C(Vp) and when Vo is weakly 
singular (because then by definitionC (Xo) C C(Vo)), the projection of Xo into C(Vo) 
is Xo itself, i.e., Vo Vo Xo = Pe(w)X0 = Xo so that 


X10 VoVo' Xo = X124 (VoVo' Xo) = X 401 (Pew) X0) = X42, X0=0 (14.31) 


which implies (14.30) holds when Vo is weakly singular, given C(X{, Vo) C 
CU 2, Uj>,)- 

To summarise, when Vo is weakly singular the requirement for retention of BLUEs 
and their covariance for estimable functions of parameters is that C(X{, Vo) C 
C(U 12, U}>,), a condition that can be simply met, for example by ensuring Uj, U}5, 
is full rank. 

As an aside in the explanation of the choice made in specifying V in (14.21), 
as straightforward algebra shows if Kala and Klaczyfiski [12, Corollary 2(e)] and 
its conditions are applied instead to (14.4) with A 4 0, ie., to V = Vo + XAX’ + 
VoX 1 BX‘ Vo, or applied to V = Vo + XAX’, although the BLUEs of estimable 
functions of parameters are unaltered by the change from Vp to V, the covariance 
of these BLUEs under Vo and V will necessarily be different. A construction of this 
type, namely V = Vo + XX’ is used by [18] to facilitate estimation of BLUEs when 
Vo is singular but not weakly singular, i.c., when C(X) Z C(Vo). 


14.6 BLUPs for Mixed Models 


In principle, the results of Sects. 14.1—14.5 can be extended to mixed models of the 


form 
Y=xXB+Zy+u (14.32) 


for y and uncorrelated and cov(u) = Vo,, cov(y) = R (both non-negative definite), 
by substitution of 
(XZ — (Vou 0 
Xo= io 7) and Vo = ( 0 n) (14.33) 


noting that 
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I 
Xol= _7 X\. (14.34) 


Considering intermediate models between a specified small mixed model and a full 
model may involve adding (or deleting) either fixed or random parameters or both. 
These two types of addition (or deletion) can be considered separately, one additional 
parameter or set of parameters at a time (as in Sect. 14.4). 


14.7 Data Cloning 


Let us return to Example | to uncover the trick. 

The original fit is via ordinary least squares (OLS) so Vp = o7/ and in the example 
(to allow for models to be reduced to their simplest form, e.g., only an intercept), the 
only term added to Vo to create V is of the form, in line with (14.21): 


VoX 121 U2 Un, X191 Vo = 04 X 121 V2 Uy, X49 (14.35) 


from (14.12). Use this new full rank V and find its Cholesky or the square root of 
its eigendecomposition, V'”. Pre-multiplying the original model by V’” and using 
different selections of U;2, can produce an infinite collection of datasets that have 
the same regression coefficients for all submodels as the original data as well as the 
same parameter covariances. For Vo = o°I, the transformed model is 


VP¥Y =V?X Bi +tV?XmBo + Ve (14.36) 


and when the parameters in this transformed model are estimated using OLS, it will 
produce the same BLUEs of estimable functions of parameters and their covariance 
as for the OLS fit to the original data and, via the results in Sects. 14.1-14.5, the same 
will be true of all submodels. 

For any full rank Vo, the required transformation is 


Wieve YY =(VEVS XB + (V'VS )XorBo + (VV) Ye. (14.37) 


Vo can instead be weakly singular rather than Vy = o7/, or Vo and/or V being full- 
rank, and BLUE(X0 0) and its covariance will again be retained by the cloned data. 
See (14.38). When Vo singular but not weakly singular, the BLUEs of estimable 
functions of original and cloned data will coincide, but for the reasons outlined in 
Sect. 14.5, it is an open question whether the covariances of the BLUEs always will. 

For weakly singular Vo with C(X , Vo) C C(U2,U}5,) and original data {Y : Xo}, 
the cloned data is {Y. : Xi¢ : Xo2c} = {Y. : X-} and the model is 


[VP )PI¥ = [VP Vor) 1X1 Bi + LV (VE) 1X02Bo2 + LV (V5) Ie 
(14.38) 
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1.€., 
Y, = X1cBi + Xo2cBo2 + ec 


(14.39) 
= X Bo +e 


where via Sect. 14.5, 
cov(BLUE(X,.8o)) = cov(BLUE(X0fo9)). 


So, via (14.21), the model using the cloned data and the model using Vo will produce 
the same BLUEs of estimable functions of the parameters and covariances for these 
BLUEs for all submodels as does the original data fitted using Vo. 

Further, because what has been added to Vo is only VoX 121 U2 U{>,.X 4, Vo with 
C(X}, Vo) C C(Ui2,Uj>,), this addition to Vo will work for all submodels. See 
Fig. 14.5. Any choice made from {Uj,U;,,} need not be used for all submodels, 
but if a single cloned and hence confidentialised dataset is supplied to a user (so as to 
minimise data transfer, and complexity of analysis) then {Uj2,U{,,} would be fixed 
for each user. 

Because there are no terms in X2);U2)1U3),X5), included, there is no need to 
orthogonalise to form X); or X14. Of course, orthogonalisation of X would be 
straightforward for Example | by specifying the original model using orthogonal 
polynomials, but in general orthogonalisation would be an extra unnecessary step 
and would complicate interpretation. The addition only of Vo X12) Uj2,U},,X 45, Vo 
focuses changes to Vo on the column space of the errors and residuals for the full 
model, which is necessarily contained in the corresponding column spaces for the 
error and residuals of the small and all the intermediate models. It also ensures 
retention of the covariances of BLUEs of estimable functions of model parameters 
for the full model and all its submodels. 


14.8 Conclusions 


Sections 14.2—14.6 have outlined necessary and sufficient conditions for simultane- 
ous retention of BLUEs of estimable function for full, small, and all intermediate 
models under changes in error covariance. Generally, however, because the addi- 
tions allowed to the original error covariance Vo to form V include terms in Xo, the 
covariances of these BLUEs are not maintained. 

One implication of Sect. 14.7 is that by excluding terms in Xo, all new datasets 
generated using the results there give the same BLUEs of estimable parameters 
and these BLUEs also have the same covariances as for the original dataset. This 
additional BLUE covariance equality property is what makes these results so useful 
in applications, e.g., for data cloning: data visualisation, smoothing, confidentiality, 
and encryption. Data confidentiality and encryption are important internationally. 
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Encryption is important to national statistical agencies and to industry, and both are 
important to various government agencies. 

Haslett and Govindaraju [8] note that for the pivot methods they used, because they 
did not preserve a sufficiently wide range of statistical properties, data confidential- 
isation applications were restricted. The additional material here in Sects. 14.2—14.7 
broadens the possibilities by outlining how a given dataset can be adjusted via a 
changed error covariance matrix in a way that preserves BLUEs via fitting a satu- 
rated (or some other suitable) linear model. The subset of such possible changes to 
error covariance outlined in Sect. 14.7 also preserves the covariance of the BLUEs. 
These new methods consequently provide a wider range of statistical uses than pivot 
methods. 

The new methods would allow statistical agencies to provide confidentialised 
data that, rather than containing the random error which many confidentialisation 
methods currently add to datasets, maintained analytic results for linear models 
using any subset of the original full model via its cloned form. By including vectors 
of ones for subgroups in the design matrix of the full model fitted by the agency to 
the un-confidentialised data, because results in Sect. 14.7 show that using cloned data 
based on (14.21) does not change the estimates of the parameters associated with 
the columns of this original design matrix, even means of subgroups and overall 
means can still be estimated from the cloned data without additional error and with 
the correct variances. The full model initially fitted by the agency might also include 
interaction terms where relevant. This might be simply extended to confidentialised 
datasets for fitting fixed effects in mixed linear models by incorporating random 
effects into Vo. 

Methods needed for generalised linear models would be more complicated. The 
results in Sects. 14.2—-14.7 assume that when the error covariance Vy is known or 
estimated as Vo and used in the full model, it is not changed. So there are limitations 
to confidentialised data when iterative procedures (such as iterated generalised least 
squares—IGLS) are to be used by external users for analysis. The exceptions are 
when the data supplied by the agency is from the final iteration of the full model 
fit (as would be suitable for estimating fixed parameters in mixed linear models) or 
provides extensive back-computation information (as would be necessary for gener- 
alised linear models). This is because to be able to reverse the iterative process from 
the confidentialised data at the final iteration of IGLS for its underlying linear model 
requires interim results from the fit that preceded it in the forward (rather than the 
reverse or backward) direction. See del Pino [16] for further discussion of the role 
and basis of IGLS. 

For full (linear) models fitted by the agency that have error covariance that is a 
multiple of the identity matrix or that have error covariance specified by incorporating 
random effects into Vo (to allow estimation of the fixed effects in a mixed model), 
there is no complication in principle with using an estimated (and hence known) error 
covariance Vy when fitting the full model to the original data rather than known Vo. 
The confidentialised data {Y. : X.} can be provided to data users along with Vo or Vo 
(simplest and of minimum size when Vo = o7/, and containing no more than woh) 
elements if it has internal structure), while the agency retains the key V, which can 
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be different or the same for different data users without any additional compromise 
to confidentiality. 

The cloning mechanism has focused on a particular dependent variable Y but, 
subject to conditions outlined below, may nevertheless still produce cloned data with 
the same BLUEs and covariances for its estimable function for different dependent 
variables using the same cloned dataset. This is a potentially important property 
because the same cloned dataset may then be used for the analysis of several different 
dependent variables, only one of which has been used to produce the cloned data. It 
broadens the utility of cloned data when it is used for confidentialisation. 

The same transformation VP)” in (14.38) is applied to both Y and (X, : 
X02) = Xo. This gives the impression that, for a given cloned dataset, the roles of 
Y and one of the columns of Xo might be interchanged so that the cloned dataset 
would also provide the correct estimates and standard errors for a different dependent 
variable. However, the original Y cannot be included in the new design matrix and 
still retain the required BLUE properties, because it is not a column in the original 
Xo and consequently not included in the original cloning process. So, despite (14.38) 
suggesting otherwise, interchange is not possible. However, whether the new Y is 
from the columns of Xo or not, choosing a new Y and having it transformed by 
V'?(V,')'? is permitted, provided the covariance of the error in the new model 
belongs to the set {V}. Feasible new error covariance structures include any rescaling 
of Vo such as would occur when the internal structure of Vo is a property of the study 
design. The possibly marked increase in cloned data utility when the same cloned 
data can be used for several dependent variables does however have a price, albeit 
relatively small. Firstly, each suitable transformed new Y must be included in the 
cloned dataset supplied by the agency. Secondly, the agency needs to verify the 
required error covariance property holds for each dependent variable of interest to 
the users. If it does not conform, a separate cloned dataset will be required for the 
new Y; the advantage is that this additional cloned dataset can easily be set up to 
include the original Y as a predictor variable. 

The methods of Sect. 14.7 provide an alternative to the usual confidentialised unit 
record files (CURFs) used internationally to release unit record data by national 
statistical agencies. Unlike CURFs, when analysed statistically, data produced by 
cloning methods produces the correct answers. Additional research is however war- 
ranted if the dataset is from a complex survey, because then even linear model fitting 
ought not to use standard statistical methods (Lumley [14)]). 

As also noted by Haslett and Govindaraju [8], an “important use for cloned data 
may be for encryption of statistical databases, and of relational databases via data 
cubes, irrespective of whether the “same regression properties are of any interest at 
all”. The methods in Sects. 14.2—14.5, not just those in the current section, improve 
the results given by Haslett and Govindaraju [8] for encryption, because they do not 
require the pivots needed there. The encryption key needed to recover the original 
data is now the particular V and Vo that has been used. Although V and Vo do require 
storage additional to the cloned data, encryption via the methods in Sects. 14.2—14.7 
does not increase data volume of the encrypted data itself (since there are still the 
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same number of variables in total). Encrypted datasets can be stored on site (separate 
from the encryption key of course) or moved between secure areas. 

In a more general context, even when the original data are known, cloned datasets 
might be generated and amalgamated to increase apparent sample size as an aid to 
model fitting via residual diagnostics. Whether Uj2,U{,,. would best be constrained 
in this type of application, for example by setting an upper bound on some form 
of its norm or to have distributional assumptions imposed (which would introduce 
a Bayesian element), are open questions. What is central to understand, however, 
is that any apparent increase in statistical power from combining multiple copies of 
cloned data is not a free good. It is instead the consequence of making the final model 
fit no longer conditional on only one particular model but instead conditional on the 
entire (or possibly some reduced or probability weighted) set of all models with the 
same BLUEs and covariances for estimable parameters of the model parameters. 
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Data with Applications 


Sudeep R. Bapat 


Mathematics Subject Classification 60E05 - 62N01 - 62N02 


15.1 Introduction 


Statistical literature is well content with the ideas of different censoring schemes. 
Some of the widely used ones include right censoring, left censoring, interval censor- 
ing and double censoring. Some of the key applications where censoring is observed 
include biometry, clinical trials, survival analysis, medical follow-up studies and epi- 
demiology. At the start, we will quickly review some of these censoring ideas, just 
for completeness. 


(i) Right censoring is seen when the subject leaves the study before the event occurs. 
Hence, if the subject leaves the study at time tr, the event occurs in (tr, 00). 
For example, consider patients in a clinical trial to study the effect of treatment 
on a particular disease. The data becomes right censored if the patient does 
not develop the disease symptoms before the study ends. Further, Type I right 
censoring occurs when the study ends at a pre-specified time, whereas Type II 
right censoring is seen if the study ends after a fixed number of failures are 
observed. 

(ii) Left censoring happens when the event of interest has already occurred before 
the patient enrols for the study. However, such censoring is rarely observed in 
practice. 


One may refer to [25] where a general right censoring scheme is discussed, [26] who 
discuss a few censoring issues in survival analysis, [28] who present assumptions 
of right censoring in the presence of left truncation, [15] which gives a particular 
hybrid censoring scheme, [23] which considers Bayesian inference under progressive 
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censoring and [24] which outlines a Type II progressive hybrid censoring scheme, 
among many others, for more extensions and applications. 

This survey paper however focuses on a different kind of a censoring scheme 
which is relatively new and emerging, called “middle censoring”. This is also a less 
restrictive and more general censoring scheme. Middle censoring occurs if a data 
point becomes unobservable if it falls inside a random interval. The data point in 
such situations includes a censorship indicator and a random interval. Clearly, right 
censoring, left censoring and double censoring happen to be special cases of middle 
censoring, by taking suitable choices for the censoring intervals. Middle censored 
observations can be found in a variety of real-life applications. To name a few, 
in a lifetime study if the subject is temporarily withdrawn, we get an interval of 
censorship, or in biomedical studies the patient may be absent from the study for a 
short period, during which the event of interest occurs. 

A brief construction of a middle censored setup is as follows: Let there be n items 
under study, with lifetimes 7), 7>,...,7,. For each item, there exists a random 
censoring interval (L;, R;), where observations cannot be made. The point to note is 
that the censoring interval is random, and independent of the lifetimes. The observed 
data in this case hence happens to be 


Tj if 7; ¢ [Li, Ri] 


= 15.1 
(L;, R;) otherwise. ( ) 


i 


Looking back, [20] first introduced this setup. Since then, numerous related papers 
have been published where the authors have tried to extend the model under differ- 
ent distributional and non-parametric setups. A well-documented work containing 
overviews of estimation strategies under middle censoring is that by [6], which is 
also the author’s Ph.D. thesis. The author mainly discusses important contributions 
in middle censoring prior to the year 2011. However, a lot has been done since then, 
which we will try to review in this paper, along with some interesting applications. A 
brief outline of the following sections is as follows: We discuss the non-parametric 
middle censoring estimation approach as proposed by [20] in Sect. 15.2. Section 15.3 
consists of an overview of some estimation techniques for middle censored data under 
some parametric models. We focus mainly on the exponential [16], Weibull [30] and 
Burr-XII [2] type lifetime distributions. As an extension to the exponential model, 
we also discuss estimation under middle censored competing risks data following an 
exponential distribution [4]. We briefly outline both frequentist and Bayesian esti- 
mation approaches. Section 15.4 gives a flavour of estimation under discrete middle 
censored lifetimes. Specifically, we cover situations where the lifetimes follow the 
geometric distribution [12] and a multinomial distribution [17]. Additionally, we 
also review estimation for middle censored data under the presence of covariates 
[19]. In Sect. 15.5, we cover parametric estimation of lifetimes which are middle 
censored and follow a circular distribution, namely the von Mises distribution [21], 
while Sect. 15.6 contains similar methodologies for an additive risks regression model 
[31]. A slightly different approach to estimation, namely the martingale method, is 
discussed here. Section 15.7 presents brief conclusions and a few final remarks. 
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15.2 Non-parametric Estimation for Middle Censored Data 


Reference [20] introduced a self-consistent estimator (SCE) and a non-parametric 
maximum likelihood estimator (NPMLE) for middle censored data. A short overview 
is as follows: Let T; be a sequence of independent and identically distributed random 
variables, with unknown distribution function Fo. Further, let Y; = (L;, R;) be a 
sequence of i.i.d. random vectors, independent of 7;’s with an unknown bivariate 
distribution G such that P(L; < R;) = 1. We thus observe the following dataset: 


i fj 1 


= 15.2 
(L;, R;) if 6; = 0, ( 


i 


where 6; is the indicator function which takes the value 1 if T; € (L;, R;). This is 
similar to Eq. (15.1) and a typical construction which is found in related middle 
censoring problems, which we will explore in later sections. 

Now if one tries to estimate the distribution function using the EM algorithm, the 
resulting equation takes the form 


F(t) = EplE, (OZ, (15.3) 


where Z refers to the set from Eq. (15.2), and which is as described by [34], where E,, 
is the empirical distribution function. This is also referred to as the self-consistency 
equation in the literature. It so turns out that under a middle censoring case, the SCE 
satisfies the equation: 


Ful) = = [LT <4 5 <4 BILE © Li. RO 


i=1 


Fn(t) — Fn(Li) 
Fn(Ri—) — Fr(Zi) i 

(15.4) 
where 6; = 1 — 4;. Again, an analytical solution to the above equation does not exist 
and one has to resort to an EM algorithm setup. An updated estimator using an 
iterative formula is given by 


FOV) = Epo lEn(t) TI. (15.5) 


The authors further show that in this case, the NPMLE also satisfies the self- 
consistency equation given in Eq. (15.4). Further, if there exist any uncensored 
observations, the NPMLE attaches all its mass to them. This is well discussed in 
the form of the next proposition. 


Proposition 2.1 


If each observed censored interval (L;, R;) contains at least one uncensored obser- 
vation T;, j 4 i, then any distribution function which satisfies (15.4) attaches all its 
mass on the uncensored observations. 
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On the other hand, the authors also discuss a situation where the censoring interval 
(L;, R;) does not contain any uncensored observations. If a similar situation occurs 
in aright censoring setup, the extra mass is left unassigned. However, under a middle 
censoring scheme there exists a natural choice where the mass may be assigned to 
the midpoint of the corresponding interval. We will now look at a basic construction 
involving a couple of intervals, which will highlight the application of the pro- 
posed methodology. Let n = 5 and z; = 2, z2 = 4, z3 = 6, z4 = (1,5), zs = (3, 7). 
Let p1, P2, p3 be the masses to be assigned to z,, Z2 and z3. The likelihood function 
hence becomes 


P1p2p3(p1 + p2)(p2 + p3). 


Now assuming that p;’s add up to | and one can interchange the roles of p; and 
p3, one can reduce the problem to that of maximizing (x?)(1 — 2x)(1 — x)? with 
Pi = p3 =x and p) = | — 2x, and the solution turns out to be x = (5 — /5)/10, 
and 1 — x = 1/75. 

The authors also prove that the SCE is a uniformly strong consistent estimator, 
and one may refer to their paper for more information and details. A few other related 
papers which are worth a mention are by [27] where the authors derive a necessary 
and sufficient condition for developing the equivalence of the self-consistent esti- 
mators and NPMLEs, by [32] who came up with an inverse probability weighted 
approach to estimate the distribution function under middle censoring, and by [33] 
who demonstrated that the NPMLE can be obtained by using the [35] EM algorithm 
or the self-consistent estimating equation given by [20] with an initial estimator 
which puts more weight on the innermost intervals. 


15.3. Middle Censoring Under Parametric Models 


This section outlines a few interesting estimation strategies under middle censored 
data for some parametric models. Specifically, we will focus on exponential, Weibull 
and Burr-XII lifetime models. 


15.3.1 Middle Censoring Under an Exponential Setup 


This problem is along the lines of [16]. We assume that the observed data is similar 
to that of Eq. (15.2) but the sequence 7), 7, ..., T;, is an i.i.d. sequence following 
an exponential distribution with mean 1/4). The probability density function thus 
becomes 


Oye *, if 0,0 0 
f(x; %) = aia teal (15.6) 


otherwise. 
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Further, as seen before, let (L;, R1),..., (Ln, Rn) be iid. with some unknown 
bivariate distribution G, and (L;, Z;),..., (Zn, Z,) be 1.i.d. sequence where L; and 
Z; = R; — L; are independent exponential random variables which are independent 
of T;. Also assume that L; and Z; have means 1/a, 1/8, respectively. In short, one 
assumes that the censoring scheme is independent of the lifetimes. Such assumptions 
are standard in any middle censoring literature. In the next subsections, we briefly 
outline the maximum likelihood estimator of @, and a Bayesian estimation setup for 
estimating the parameter 0. 


15.3.1.1 Maximum Likelihood Estimator 


Without a loss of generality, let the first n, and the rest nz observations (n, +n. =n) 
be uncensored and censored, respectively. Thus, the observed data becomes 


{(T, 1), ae) (Th, 1), (Ln 41, Rnj+1)s re) Cones Rn, +tny)}- 


One can hence write down the likelihood function which takes the form, 


ni +n2 
1(0) = coe? Didi ti I] (e-%% — ef), (15.7) 


i=n,+1 


where c is the normalizing constant depending on a and 8. Now one has to take 
the derivative of the above likelihood function, set it equal to 0 and solve, to find 
the MLE. However, this equation does not give a closed-form solution and one has 
to resort to an iterative procedure. Let the equation x = 0 be written as h(9) = 0, 
where /(@) is a function of the variable and the constants. Thus, a basic iterative 
procedure can be to start with an initial guess 6, then obtain 6° = h(6) and 
so on. The procedure can be stopped if |@ — 6“+)| < €, where € is a predefined 
small positive number. One definitely needs an initial guess for 6 and one can pick 
6 = n,/>~"!, t;. A suitable EM algorithm can also be applied but we will omit 
further details. One may refer to the source paper. The authors also outline a series of 
lemmas and propositions which give the convergence of the iterative procedure, the 
likelihood function and the MLE, along with deriving the consistency of the MLE, 
and its asymptotic distribution. 


15.3.1.2 Bayesian Analysis 
Rather than using the conventional MLE for estimating the parameter 0, one 


can approach the problem through a Bayesian angle. Let the parameter 6 have a 
Gammac(a, b) prior distribution, with the pdf given by 


298 S. R. Bapat 


be a-1 eP. 
m(0) =xz(Ola, b) = Te (15.8) 


The likelihood function of the observed data hence becomes 


ny+n Sas 
I(data|0) = cO™e~9 Lisi" I] (1 — 678% eW 8 Linn tit, (15.9) 
i=n,+1 


Now based on (15.8) and (15.9), the posterior density is given by 


l(data|0)m (0) 


m(0|data) = Jo Udata|6)x(0)d0- 


(15.10) 


It turns out that if there are no censored observations, Gamma(a +n, b + 3°", ti) 
happens to be posterior. However, if there exist censored observations, a Gibbs sam- 
pling technique proves useful to find the estimators. In this case, one can start by 
looking at a restricted estimator which is given by 


be 
Sriret,r) (t|9) = SOL oR" (15.11) 


A rough outline of the Gibbs sampling technique is as follows: 


ni 


1. Generate 6,1) from a Gamma(a + ni, b+ ye , ¢)) distribution 

2. Generate the incomplete data t* from the restricted distribution given in (15.11) 
3. Generate 02) from Gamma(a +n, b + 078) ti + in 41 7) 

4. Go to Step 2 and replace 6,1) by 0a). Repeat Steps 2 and 3 N times. 


Finally, the Bayes estimate under squared error loss is given by 


N 
A 1 
OBayes = N_—M > Ai), (15.12) 


i=M+1 


where M is the burn-in sample. 

A related work connecting an exponential distribution with middle censoring is 
by [4], where the authors also add competing risks data. We will briefly discuss the 
methodology in the next subsection. 
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15.3.2 Middle Censored Competing Risks Data Under 
Exponential Distribution 


Reference [4] discusses possible estimation, both under a frequentist and a Bayesian 
approach, of the exponential parameter under middle censoring and competing risks 
data, which we will briefly explore for completeness. 

Assume that n identical units are put on the test, and 7; is the failure time of the 
i‘ unit. An alternative notation for the failure times is 


T; = min{Xj1, Xi2, ae) Xis}, 


where X;; is the latent failure time of the i th unit under the J ‘th cause of failure, 
i=1,2,...,n; 7 =1,2,...,5. Now a general notion under competing risks data 
is that the failures are independent but not observable in nature, whereas 7;, C; are 
observable where C; = j if failure is due to cause j. Further, let X;; follow an 
exponential distribution with pdf, 


f(t; 0;) =0je*", 1,0; > 0. (15.13) 


Now using the independence of the lifetimes, one can derive the density of the lifetime 
and the corresponding cause (7;, C;), which is given by 


fr.c(t, J 9) = fits ) I] [1 — Fy(t; @)] = oje™", (15.14) 
l=1,14j 
where 0 = (6),...,0;) and 0, = ee 0;. Now similar to the construction of the 


observed data seen in the last subsection, the data in this case becomes 


T,, C;, 1 ’ if T; L;, R; 
CA Te aha i a (15.15) 
([L;, Ri], Ci,0) T; € (Li, Ri]. 


Further, we assume that (L;, Z;),..., (Ln, Zn) areiid. where L; and Z; = R; — L; 
are independent random variables with exponential distributions having pdfs, 


fii a) = ae 


and 


fale py =e", 


l,a, B, z > 0. Note that a, 6 do not depend on 6 and L;, Z; are independent of 7;. 
Now again assuming a simple reordering of the data, the final observed data is given 
by 
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{(T1, Ci, 1) prey (Thy, Cn, 1), (£nj41, Rny+1], Chi 41; 0) prey (LEnj4n3, Rnjtn), Cny+n2> 0)}. 


Based on this data, the likelihood function can be written as 


S ny+nz 
L@«]] [[ (Fe. 50 - FG, MYO (15.16) 
j=li=n,41 
1 n—ny s of , mp ming mitng 
-(5) [le eee [dae 0517 
* j=l i=n+1 


where z; = r; —/; and m; = o4s , /(C; = Jj) is the total number of failures due to 
cause j. A routine method to find the MLE will be to take the derivative, set it equal 
to 0 and solve. It can be seen that the MLE of 6;, say 6,, satisfies the following 
equation: 
fe. (15.18) 
J ny 4 no 9 


j =1,2,...,5, where 6, is the MLE of 6,. Now since this equation does not give 
an analytical solution, one has to resort to an appropriate iterative idea or an EM 
algorithm. One may refer to the source paper for a suitable methodology and further 
details. We will now briefly look at a Bayesian estimation angle to estimate the 
unknown parameter 0;, j = 1,2,...,5. 


15.3.2.1 Bayesian Estimation 


A suitable prior distribution in this case will be a Gamma(aj;, b;) distribution. We 
assume that all the hyperparameters a;,b; are known. Now going by the standard 
approach, the posterior density can be derived and is seen proportional to 


Ss ni +n 
nOldaia) = I je ee ttn my Il | = tl _ e*2)| 
j=l i=nt1 \~* 

(15.19) 

The Bayesian estimator of 0; under a squared error loss function can be thus obtained 

as E'(0;|data). Now again, the Bayesian estimator cannot be obtained analytically 

and hence one has to resort to some suitable iterative methods. The authors outline 

two such methods, namely Lindley’s approximation and Gibbs sampling to obtain the 

estimators. A point to note is that if there do not exist any censored observations, the 

Bayesian estimator of 6; can be easily evaluated as (mi + aj;)/(bj + V7L, ti), where 

m', = yo", 1(C; = j). We now provide the form of the estimator under Lindley’s 
approximation just for completeness. 

Let Ase denote the estimator obtained through Lindley’s approximation which is 


given by 
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sta foe ke (Set-n) ere EEE oe oom 


v=1 j’=1 k=1 j 
where F 
1, — Blog L@) _ [ZF i= j'=k, 
an at 
-~ 96) 00 j OO —A otherwise., 
2n> ni+n2 1+e*z; cr 
A=—3 + > G— emit 
x i=n,+1 

where 6 = (61, 6, sane lp) and oj; = (i, j')" elements of the inverse matrix 


[-Lirjlsxs5 J = 1, 2; fe shee as 


15.3.2.2. Interval Estimation 


We will now try to give a flavour of interval estimation of the MLEs. Using the 
standard assumption of asymptotic normality of the MLEs under some regularity 
conditions, one can construct suitable confidence intervals for the unknown parame- 
ters. On utilizing the asymptotic distribution of the MLEs, the sampling distribution 
of 6; — 6;)/,/Oj7, j =1,2,...,s8 can be approximated by a standard normal dis- 
tribution. Thus for a given y € (0, 1), atwo-sided 100(1 — y)% confidence interval 
can be constructed as 6 j H Zy/2,/07;, where Z,/2 is the usual upper percentile point 
of a standard normal distribution. Now taking the help of the generated samples using 
the EM algorithm and using the method described in [8], a highest posterior density 
(HPD) credible interval can be easily constructed. 


15.3.2.3 Reconstruction of Censored Failure Times 


The authors in their paper have discussed an interesting and needy issue which is that 
of reconstructing the censored failure times. If in a middle censoring scheme 4; = 0, 
one only observes the censoring interval [L;, R;]. Thus one would argue as to what 
the actual observations are. A possible reconstruction of the censored failure times 
is a solution in this case. Several point reconstructors of the censored failure time T; 
are obtained such as the unbiased conditional reconstructor, the conditional median 
and mode reconstructor and the Bayesian reconstructor. We will briefly outline one 
of them here. 

A good choice of a point reconstructor of 7; happens to be E(7;|data) = 
E(T,\T; € (1, 7;]). Thus upon invoking the conditional distribution of 7; given 
T; € [/;, r;], one can derive the point reconstructor as 
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‘ 1. Ae" (15.21) 
0, 1 = en 2" : 
One can also derive the conditional expectation of T given 7; € [L;, R;] which is 
given by 


E(T|T; € (Li, Ril) = (@+0,)(B +0,) (15.22) 


If the parameter 0,. is unknown, one can replace it with the MLE. Further, the authors 
also discuss a comparison among the three above-mentioned reconstruction methods 
using a suitable Pitman’s measure of closeness. We will leave out the details for 
brevity. 

We would like to mention a few other recent but related papers for complete- 
ness. The first one is by [13], where the authors have developed a dependent middle 
censoring scheme with a censoring interval of fixed length, where the lifetime and 
lower bound of the censoring interval are variables with a Marshall-Olkin bivari- 
ate exponential distribution. The next is by [36], where the author established point 
and interval estimation problems for the exponential distribution when the failure 
data is middle censored with two independent competing failure risks. The next is 
by [29], where the authors have dealt with the modelling of cumulative incidence 
function using improper Gompertz distribution based on middle censored competing 
risks survival data. The last one is by [37], where the authors considered a depen- 
dent competing risks model under a Marshall-Olkin bivariate Weibull distribution. 
A suitable MLE approach, a midpoint approximation estimator and appropriate con- 
fidence intervals are designed. Additionally, a Bayesian approach using a Gamma- 
Dirichlet prior distribution is proposed, where the estimators are obtained through 
an acceptance-rejection sampling method. One may refer to the source papers for 
more details. 


15.3.3 Middle Censoring Under a Few Other Parametric 
Models 


We will now briefly outline the parametric estimation procedure under a middle cen- 
sored data where the lifetimes follow a Weibull distribution and Burr-XII distribution, 
respectively. 


15.3.3.1 Estimation Under a Weibull Model 


This work is along the lines of [30]. Let T be a non-negative lifetime of a subject 
under study, which follows a Weibull(a, B) distribution having the following density 
function: 
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f(t) = aB-'(B-'t)*"! exp{-(B't)"};_ t > 0,a@ > 0, B > O. (15.23) 


Further, let (U, V) be the censoring interval which is independent of T and with a 
bivariate cumulative distribution G. Also assume that Z denotes a set of covariates 
which maybe continuous or mere indicator variables. Thus, the observed vector is 
denoted by 
T, if5=1 
= : (15.24) 
(U,V) ifd=0, 
where 6 is the usual indicator variable. The authors assume a Cox proportional 
hazards model under which the conditional density takes the form, 


f(tlz) =ay7!(y7't)*"! exp{-(v7'1)"};._ tha, y > 0, (15.25) 


where y = Be °Z/e and 0 = (6), ..., @,) is a vector of regression coefficients. Now 
let (X;, 6;, Z;) be n ii.d. copies of (X,6, Z) corresponding to n subjects under 
investigation. One can note that the likelihood function in this case becomes 


LW) x | | fale) 8 @ilzi) — S@AZ)N*, (15.26) 


i=1 


where yw is the collective parameter vector given by (a, 8, 1,...,6,)’. Again in 
this case, finding an analytical solution for the MLEs is not possible and hence one 
may resort to an EM algorithm, which is outlined in the source paper. We leave 
out the details for brevity. It is also shown that the derived MLE v = (a, B, 6) 
follows asymptotically (p + 2)—variate normal distribution with mean vector 0 and 
dispersion matrix [ | where J (y) denotes the Fisher information matrix that can 
be found out after a set of usual manipulations. 


15.3.3.2 Estimation Under a Burr-XII Distribution 


The Burr-XII distribution was introduced by [7] and finds its use to model stochastic 
events and lifetimes of products as it has a non-monotone hazard function. This 
estimation methodology was proposed by [2]. A random variable is said to follow 
Burr-XII distribution with parameters a and b if the density function is given by 


f(t) =abr?'(1+0)-*"!, t,a,b>0. (15.27) 
Assume that 7;,..., 7, are ii.d. Burr-XII variables and let Z; = R; — L;,i = 
1,..., denote the length of the censoring interval with an exponential distribution 


with parameter y. We also assume that the left bound of the interval L; also follows 
an exponential distribution with parameter 2. Moreover, 7;, L;, Z; are independent. 
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We now try to represent the likelihood function under this case by reordering the 
observed data into uncensored and censored observations. Thus, the observed data 
becomes 


{T, re) Thy» (Lnj1; Rn,+1)s re) (Lij+ns Rn,+tny)}, 


and the likelihood can be written as 


ny ni +n 
L(a, bt) = c(aby" [Pe ta+ey Or? [TT [atr?y*-atey), 
i=1 i=n|+1 


(15.28) 
where c is a normalizing constant that depends on A and y. One can follow the usual 
pattern of finding the log-likelihood and maximizing that to find the MLEs. However, 
an analytical solution is not possible and one has to resort to an iterative procedure. 
The authors choose to adopt a Newton-Raphson method to do the same. 


15.4 Inference for Discrete Middle Censored Data 


Reference [12] in their paper developed a discrete middle censoring problem where 
the lifetime, lower bound and length of censoring interval are variables with the 
geometric distribution. A brief outline of the problem is as follows: Let 7, To, ..., Th 
be arandom sample of size n from a geometric distribution with mean 1/69. The pdf 
is given by 

P(T, = t)) = 6, (15.29) 


where Ap = 1 — 6, t; = 0,1,2,... Further, let (1), X)),..., (Ln, Xn) be iid. 
where L; and X; = R; — L; are independent geometric random variables with means 
1/p; and qg,/ px, where 

PO, =1) = pias Ge I—pe = 01,205, 


and 
Pea) pe, Ge a=1l— pe, BH 1, 2iees, 


where p;, px do not depend on 6, and 7; is independent of L; and X;,i = 1, 2,... 


15.4.1 Maximum Likelihood Estimator 


As seen in the previous section, the MLE of 9 does not exist in a closed form, and 
hence an appropriate EM algorithm has to be applied. Now just to be in line with the 
notations used in the paper, we first define the following two random variables: 
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ule if T; ¢ (Li, Ril, 
= | [Li, Ri], T; € (Li, Ri] (15.30) 


and 


(15.31) 


V= 1, JT € (Li, Ril, 
‘10, % € (Li, Ril, 


i=1,2,...,n.Now further let W; = (U;, V;),i = 1,2,...,n, andW= (Wj,..., 
W,,). Now in order to get the density function of W, one has to make use of the 
variables U, V from Eqs. (15.30) and (15.31). The pmf of W thus becomes 


Pa (W = w) = Pa (U =u, V =v) =[cald — ay] [eoboa"']”, 


where cy = Pa (L =1,R =r), co = Pa (T ¢[L,r]) and x =r—L. Again as 
assumed earlier, let the first n,; observations be uncensored, whereas the rest n2 
observations be censored. The final observed data thus becomes 


Ww _ (71, 1), ease (Th, 1), ((Ln,41, Rn, +1], 0), ce ey ([Lnj+n Rnij+nol, 0)]. 


The likelihood function takes the form, 


“ Pee ny+n2 
L)o = 68" (1 — a) EEN") TT aay, (15.32) 
i=n+1 
where c = cj?c3' is the normalizing constant which does not depend on 6. Again, 
one cannot find an explicit solution of the MLE and hence one has to resort to an 
EM algorithm. Further one can also show that the MLE is a consistent estimator of 
6. We omit details for brevity and one may refer to the original paper, where the 
authors have discussed several lemmas in this regard. 


15.4.2 Bayesian Inference 


In this context, we assume a Beta(a, b) prior distribution for the parameter 0. The 
pdf takes the form, 


(0) «07 '(1— 6)?! 0<6<1,a,b>0. (15.33) 


Now denoting x; = x,,+; and/; = /,,+;, one can rewrite the likelihood function from 
(15.16) as 
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L(@) = L(w|9) _ 00") Cid itil) IIa a pithy, (15.34) 


i=l 
The posterior density of 6 can be written as 


L(w|@)2 (0) 


; 15.35 
fy L(wl0)2(@)d0 a 


z(O|w) = 


After a few simplifications, one can write the posterior density of 6, and can come up 
with its Bayes estimator. We omit details for brevity. It turns out that when 72 is small, 
the evaluation of E (0|w) is straightforward. However, when 77 is large, itis difficult to 
obtain a numerical solution. One can apply a suitable Metropolis-Hastings algorithm 
as per [9]. The authors consider the starting density as a Beta(3, 3) distribution and 
calculate the estimated risk using the formula, 


1 N 
ER(8) = 7 Yi = 6)", (15.36) 
i=1 


where 6; is an estimate of 0 in the ith replication and N is the number of replications. 
One can also easily construct a credible interval for 6 using the procedure described 
in [8]. 

A somewhat related and extended work is by [19] where the authors discuss the 
analysis of discrete lifetime data under middle censoring and in the presence of 
covariates. We will briefly look at the same in the next subsection. 


15.4.3 Discrete Lifetime Data Under Middle Censoring 
and Presence of Covariates 


The paper by [19] tries to extend the ideas proposed by [12] by generalizing their 
setup by bringing in covariates. The proofs and results are discussed by exploiting the 
relationship between an exponential and a geometric distribution. This connection is 
used to derive the MLEs under middle censoring and in the presence of covariates, 
by considering an accelerated failure time model for the geometric distribution. 
The following proposition gives the connection between exponential and geometric 
distributions. 


Proposition 4.1 


If X is an exponentially distributed random variable with parameter i, thenY = |X | 


where |-| stands for the integer part of x, and is a geometrically distributed random 


variable with parameter p = 1 —e~*. 
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Also interestingly, both exponential and geometric distributions satisfy the famous 
“memoryless” property. Thus, by invoking the memoryless property and the above 
proposition, one can generate geometric lifetimes from the exponential lifetimes, 
T, ~ Exp(A). The geometric lifetimes hence can be derived as Y; = |7;] ~ 
geometric(p), where the pmf becomes 


P(Y; = yi) = pd — p)”, 


where y; = 0,1, 2,...andp = 1-— e~*. Now let the left bound of the censored inter- 
val be denoted by U; = |L;| ~ geomteric(p,), where L; ~ Exp(a), py = 1— 
e °. Further let the length of the interval be denoted by W; = |S; | ~ geometric(pw), 
where S; ~ Exp(B), S; = Rj — Li, pw = 1 — 78 and W; = V; — U;, where V; is 
the right bound of the censored interval. It turns out the Y;, U;, W; are independent 
for all i. 

We will now look at a geometric model in the presence of covariates, where the 
covariates are taken as fixed. Let 7; be the survival time for each patient along with 
its covariates denoted by Z;. The authors then propose an accelerated failure time 
model and argue that it is equivalent to the Cox proportional hazard assumption when 
the baseline distribution is exponential. The pdf of this particular exponential density 
is given by 

f(t|Z;) = re® % exp(—ae® 't), (15.37) 


where 6 is the effect of each covariate Z, and T stands for transpose. Hence, the geo- 
metric lifetimes become Y; = |7;| ~ geometric(p;), where p; =1-— 
exp(—Ae® Zi t). The left endpoint, length and right endpoint of the censored interval 
is denoted as before as U;, W;, V;, respectively. One can thus write the necessary 
likelihood function and derive the MLE of p. Again, as discussed before, after a 
slight reordering of the data, the observed data becomes 


{Yi, Y), a) Yas [Un, tls Vay + il, [Un, +2> Vay, +21, or) [Un +00 Vij tnol}- 


Now similar to the expression seen in [12], the likelihood function can be written 
down as per Eq. (15.16). Finally, as the analytical solution cannot be obtained even 
here, one can resort to an EM algorithm appropriately. One may refer to the source 
paper for more details. 


15.4.4 Middle Censoring in the Multinomial Distribution 
with Applications 


Another relevant paper in the discrete distribution area is by [17], where the authors 
develop estimation strategies under a multinomial setup and middle censoring. An 
interesting application in this context is that of a “consumer rating” scenario. Imagine 
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that a group of customers are told to rate a certain product (in terms of stars) which 
they have bought, maybe through an online shopping portal. A small addition is that 
the customers are allowed to provide interval ratings say, between | and 2 stars or 
between 2 and 3 stars, etc. Such a situation can be easily modelled using a multinomial 
distribution, which we will briefly explore here. 

Assume that the company contacts n individuals, each of them being asked to 


rate the product in terms of {1, 2,...,k} stars, according to his/her liking for the 
product. Let f; stand for the number/frequency of people giving j stars. If we denote 
the true probabilities of giving 1, 2,...,k stars by pi, po,..., De respectively, we 


have the standard multinomial scheme with ae | fj =n and ee pj =1.Now 
further assume that each individual is also allowed to give “interval ratings” for the 
product. We discuss how the estimated probabilities for each category change. The 
next subsection gives a general likelihood function in such a case. 


15.4.4.1 The Likelihood Under a General Scheme 


Let J represent an interval say, j; to j2 of categories with corresponding probability 
P,}=>> jer Pj for this interval. Now out of n individuals, let m of them provide 
interval ratings that belong to the intervals {/;; 7 = 1,2,...,m} with r of these 
being distinct. The remaining n — m individuals provide specific singleton ratings, 
of which let there be k. Then the probabilities become 


r k 
\ > P, +> p=. (15.38) 
j=l i=1 


Further assume that the frequency in the interval J; is F; and the frequencies in the 
k individual categories are f;. Then Bam F; = mand ar f; =n —m, so that 


r k 
SOF +)0 fin. (15.39) 
j=l i=l 


One can then easily write down the likelihood function for the vector p, which is 
given by 


r k 
pee a Bee a (15.40) 
j=l i=l 


Finally, one aims to find the associated MLEs by either using the usual maximization 
approach or a suitable iterative method. One has to note that if certain intervals 
overlap, finding analytical MLE is even more difficult. We now briefly discuss a 
situation where there exist two disjoint interval ratings and their associated problem 
formulation. 
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Let [i1, j1] and [i2, j2] be two disjoint intervals with frequencies fj, ;,, fi,;, and 
respective probabilities p;, ;,, Pi, ;,. We further have pj, j, + Pi, ;, + yes pi = 1. The 
likelihood function then becomes 


fi fi fig f fii firs 
pi Pe aD eae es pp Pixjn > (15.41) 
where 1 <i, < jy <in < jo <k and pp =(1— pi —---—2p;, —--- —2py, 
Phyl +++ — 2piy — +++ — 2pj — +++ — Pe-1). The MLEs in this case exist in ana- 
lytical form which are given by 
pi = fl os, Bi, (= iy Bin POE os a _ fin Bin tir Dees Pp = Sk 
n n n 
(15.42) 
and, 
br _ Si, (: +4 fiji ) Ph = Sy (1 +4 Sirjr ). 
2n 1 ae 2n Fieri Jy, 
(15.43) 


where J; =7,, (i, + 1),..., jj and yb = in, (4 +: 1), ..., jo. Now if no individual 
opts for the given interval ratings, the MLEs take a simpler form and are given by 
Pi, = fi,/2n and p;, = fi,/2n. The authors also provide asymptotic variances of the 
estimates, but we leave details for brevity. 


15.4.4.2 Bayesian Estimation 


As seen above, the unknown parameter vector is p = (P;,, Pi,,..., Pi, Pi, P2,+++s 
Px). Going by the usual ideology, we will assume a Dirichlet prior to the parameter 
vector. Hence, the setup is as follows: Let X = (X1, X2,..., X») denote the choices 


of the n individuals with each X; taking either an interval or a singleton rating. Hence 
we can assume 


X|p ~ Multinomial(p) 
ple ~ m(p) = Dir(@), 


where Dir (a) stands for a Dirichlet distribution with parameter vector a = (aj,..., 
Ot, Oy, O2,..., &y), Where a;*(> 0) corresponds to the respective prior parameter on 
each P;,, j = 1,2,...,r. The Dirichlet density is given by 


TOY + Phe) oe =, 
Dir (ple) = 2oja - Dic Pie ‘x T] pt. (15.44) 
[jas F@/) Tia Pi) j=l i=l 


Now on applying the usual approach, one can get the posterior density which is given 
by 
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L(data|p)1(p) 
J, L(data|p)x(p)dp- 


(p|data) = (15.45) 


A point to note is that one can obtain the Bayes estimate of p under some specific 
illustrations, which are neatly outlined in the source paper. The authors also include 
extensive parametric bootstrap and simulation analyses to complement their ideas. 


15.4.4.3 A Real Data Example 


Now to complement the methodology from the above subsection, we will give a 
flavour of a real data application. A survey was conducted at the University of Cal- 
ifornia, Santa Barbara, by one of the authors of the paper, which asked a class of 
students about their overall experience with regard to a software capable of organiz- 
ing online meetings and live sessions. They could give ratings of 1 through 5 (5 being 
the highest rating) along with a couple of “interval ratings” consisting of [1, 2] and 
[4, 5]. The data was collected from a group of n = 90 students from the class. This 
falls perfectly in the paradigm of middle censoring, especially the situation which is 
discussed in Sect. 4.4.1. The likelihood and log-likelihood functions in this case take 
the following forms: 


L = pi! p8 pf pi p& (v1 + pr)" (pa t+ ps)", 
and, 


logL = fi log pi) +---+ fs log ps + fiz log(pi + p2) + fas log(ps + ps), 


where p3 = (1 — 2p; — 2p2 — 2pa4 — 2ps). We then obtain the estimated probabil- 
ities (MLEs) from Eqs. (15.42) and (15.43). Table 15.1 outlines results for these 
students’ observed frequencies. Note that all these results are from a single run (sin- 
gle question in the survey). 

As can be seen from the above table, the intervals [1,2] and [4, 5] put a slight 
additional mass on pj’**, p}’"* as well as pi’/"”, px'"” compared to the original 


values of f;/n,i = 1,2,4,5. 


Table 15.1 Real data example with n = 90 
i 1 2 3 4 5 [1, 2] [4, 5] 
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15.5 A General Censoring Scheme for Circular Data 


In the previous sections, we have seen a variety of non-parametric and parametric 
estimation methodologies under middle censored data. However, all the datasets 
were examples of usual samples (lying either on the entire real line or a subset 
of it). Reference [21] discusses a general censoring scheme which is similar to a 
middle censoring scheme for circular data. Circular data is a set of independent and 
identically distributed (i.i.d.) values in two-dimensional directions. Such variables 
are also called angular data, as they can be represented on the circumference of a 
circle. A few examples consist of wind directions, the angles made by sea turtles while 
walking from the sea to the mainland, or the vanishing angles for a group of birds. 
Now a middle censored observation arises if an observation falls inside a random 
interval, whose endpoints may be observed but not the actual data point. A simple 
example is that of the bird’s vanishing angle at the horizon, which may be blocked 
due to a passing cloud for a short while in between, thus making it middle censored. 
The authors specifically develop estimation strategies under middle censoring for a 
von Mises distribution, which we will explore next. 

Let a1,...,@, be a set of angular measurements following a von Mises distribu- 
tion. The pdf is given by 


1 


= ——__ ef 80-0 <q < 27, (15.46) 
2m Ip(k) 


f(a; 4) 


where jz is the mean direction and x is called as the concentration parameter. Here, 
0 = (u,K) € © = [0, 277) x [0, oo) and J, is the modified Bessel function of the 
first kind and order v, which takes the form, 


1 20 
L(zZ)= =| cos vter dt. 
20 0 


Now the goal is to estimate the parameter vector under middle censoring which 
means when a few observations fall inside a random interval it forms an arc on 
the circumference. Let these observed arcs be (/;, r;). Then the likelihood function 
becomes 


n 


n ‘ 1—6; 
Ly(a; 6) = ator exp E dai cos(a; — | I] if exp[k cos(t — wde| F 


= (15.47) 
where 4; is the usual indicator function which takes the value 0 if the observation is 
censored or | otherwise. The authors further provide the log-likelihood function and 
its derivatives from which the MLE can be found numerically. One can also substitute 
the MLE in the information matrix to obtain the observed information matrix /. 
Moreover, ./n (6, — 0) is asymptotically normal with mean 0 and covariance — 
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15.5.1 Non-parametric Estimation of Censored Data 


We will now briefly look at the self-consistent estimator of the parameter 6. Let A; be 
a sequence of 1.i.d. circular random variables with an unknown distribution Fo. Let 
(L;, R;) be a sequence of i.i.d. random vectors, independent of As. Assume that both 
these components take values in [0, 27r). Interestingly since there is no ordering ona 
circle, R; may happen to be less than the corresponding L;, for example, a censoring 
arc from 355 to 18. Now following the usual procedure, we will equate the distribution 
function with the conditional expectation of the empirical distribution given the data. 
In this case it becomes 


F(t) = Erl(En)()|Ai, (Li, Rid], (15.48) 


where E,, is the empirical distribution function. This equation is referred to as the 
self-consistency equation according to [14]. Now in the presence of censored data, 
the above distribution function satisfies the equation, 


F(t) = . 3 {arca, <t)+51(R; <0) + 81 [t € (Li, ae 
ra ~ ~ F(R;—) — F(Li) 
(15.49) 
Again it turns out that the above equation does not have any closed-form solution and 
one has to resort to an iterative procedure. The authors also discuss a NPMLE for the 
parameter and show that the maximizer of the non-parametric likelihood function 
also becomes self-consistent. We skip the details for brevity. 


15.6 Additive Risks Regression Model 
for Middle-Censored Data 


Generally in survival studies, covariates are important to represent the differences 
in the population. As seen in the above subsections, the papers by [30], where the 
authors discuss a proportional hazard regression model for middle censored data, 
and [19], who analysed discrete middle censored data in the presence of covariates, 
discuss ideas in this regard. This work is by [31] which we will briefly discuss here. 
In the literature, a widely applied hazards model is that by [10]. It is a multiplicative 
hazards model which takes the form, 


h(t|z) = ho(t) exp(z' 9), (15.50) 


where T has a baseline hazard function ho(t), z is a p x 1 vector of covariates, and 
6 = (6,..., Oy)" is the vector of regression coefficients. However, the authors argue 
that this PH model does not fit lifetime data well and hence the additive risks model 
(AR) given by [1] proves to be beneficial, and takes the following form: 
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A(t|z) = ho(t)+z!0. (15.51) 


The following paper discusses the estimation of the unknown baseline survival func- 
tion So(t) which is subject to middle censoring. 

Now as seen before, let T be the lifetime variable, (U, V) be the censoring interval 
having a bivariate cdf given by G(u, v) = P(U <u, V < v), and one observes the 
vector (X, 5, z) where 6 is the usual indicator function. In the following subsections, 
we briefly outline an estimation technique which seems useful in this case. 


15.6.1 Martingale Method of Estimation 


This method consists of developing a partial likelihood function under the model 
(15.51), and a stochastic integral representation of the score function derived from 
this partial likelihood function is used to infer the unknown regression coeffi- 
cient. Let the observed data be (Xj, z;, 6;), | < i <n which are n 1.1.d. replicates 
of (X, z, p). In the presence of middle censoring, let the counting process corre- 
sponding to the ith individual be represented by N;(t) = [(X; < t, 6; = 1),t = 0, 
which indicates whether the event occurred at time ¢ or not. One can define the at- 
risk process R;(t) = 1(X; => t, 6; = 1) + T(U; => t, 6; = 0) where value | indicates 
whether the i“ individual is at risk at time ¢ or not. Further let o{N;(u), R;(u+), z; : 
i=1,2,...,n;0 <u <t} be the filtration denoted by ¥,. Now under model 
(15.51), the conditional cumulative hazard rate for the i“" individual is given by 
A(t|z;) = A(t) + z} Ot, where H(t) is the baseline hazard function. Model (15.51) 
also assumes 

E(N,(t)|-Fi-] = (hot) + 8" zi) Ride, (15.52) 


and the intensity function corresponding to the counting process N; (t) can be written 
as 


Ni(t) = M(t) +f R;(a)d H(a\z;), (15.53) 
0 


where M(-) is the local square integrable martingale. One may refer to [5] for more 
details. Now the derivative of the above equation can be written as 


dNj(t) = dM(t) + Ri(t)dH (t]zi), (15.54) 


such that 


n 


S\dMj(t) = Y“[dNj(t) — Ri()(H,(t) + 6" zidt)] = 0. (15.55) 
i=l 


i=l 


To estimate 0, we will consider the partial likelihood function given by [10, 11], 
which is defined as 
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A(tiy|Za) 
L(@) = a (15.56) 
I ae i Rilt@)A(talzn)’ 
where f(1), ... , {¢) are the k observed exact lifetimes arranged in an increasing order. 


Now one can go along the usual lines of first obtaining the log-likelihood function, 
which in terms of the counting process can be defined as 


CH=) [ log (ho(s) +z} 8) dNi(s) - [ log (>: Ry(s)(ho(s) + <0) dN(s), 
i=l 


i=1 
(15.57) 
where N (s) = ya N;(s). The score function thus becomes the derivative of the 
above function C (0) and let it be denoted by U (0). The estimate of the true regression 
coefficient 6 can thus be obtained by slightly simplifying U(@), which is given by 


U(0) = py | {z; — ZH{dN;(t) — Ri(t)z} dt}, (15.58) 
i=1 


where Z = )~_, zi Ri(t)/ )-7_, Ri(t). Equation (15.58) assumes that U (9p) is a mar- 
tingale integral with mean 0. It is also linear in 6, and one can get the estimator as 


b= Efe pager wa] > fe i— 1dNi(), (15.59) 


where a®? = aa’. Using this, one can further obtain the estimator of the cumulative 
hazard function H(t) given by 


t n } _ RP. TA 
Ho(6, t) = / 2 lO NG) = Bil@)e, Baa) (15.60) 
0 


vei R;(a) 


and the estimator of the corresponding conditional survival function which is found 
as 
S(t|z) = exp{—Ho(6, t) — z' Or}. (15.61) 


The authors in their paper have further considered an iterative procedure to find the 
estimators of the underlying parameters, but we will leave out the details for brevity 
as it is quite similar to some of the earlier sections. 
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15.7. Conclusions 


In this paper, a detailed review of estimation strategies under middle censored data 
is covered. Even though middle censoring is a relatively new censoring idea, which 
was proposed by [20], a lot of ground has been covered since then, both in terms 
of non-parametric and parametric distribution models. Some of the key distributions 
where middle censoring has been applied and well studied include the exponential, 
Weibull, Burr-XII and other related models. Among the discrete distributions, the 
geometric, its extensions and the multinomial have been covered. As mentioned in 
the introduction, the thesis by Nathan Bennett was an attempt to document some of 
the prominent papers in this area, prior to 2011. This paper focuses on other mod- 
els, settings and extensions, which were majorly published recently. Some of these 
ideas include estimation under middle censoring in the presence of covariates, the 
additive risks regression model, estimation under middle censored competing risks 
data or parameter estimation for circular data. Due to the page limitation, this paper 
fails to include some important and interesting ideas pertaining to middle censor- 
ing. To mention a few, [18] proposed an approximation for the distribution function 
through which the estimator and its asymptotic properties are easily established, [3] 
considered the exponential and Weibull distributions to study the robustness of their 
parameter estimates under right and middle censoring and [22], where the authors 
established a semi-parametric regression model for analysis of middle censored life- 
time data. Some of the ideas are also complemented through interesting applications, 
e.g., the “interval rating” problem under the multinomial distribution setup. It is truly 
believed that this paper will serve as a ready reference for future research in this area. 
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16.1 Introduction 


The minus partial order and the star partial order on the class of rectangular matrices, 
and the sharp order on the class of square matrices are among important matrix 
relations. These relations are studied with the help of different classes of generalized 
inverses to a great extent. Therefore, these partial orders on the respective classes 
of matrices are called G-based partial orders. Several authors in the recent past 
have defined new relations in the class of matrices, considering different types of 
generalized inverses and outer inverses. For more details, readers are referred to 
Drazin [5], Hartwig [7], Nambooripad [18], Hartwig and Drazin [8], Mitra [14], 
Baksalary and Mitra [1], Mitra [15], Mitra and Hartwig [16], Ben-Israel and Greville 
[4], Baksalary and Trenkler [2], Mitra et al. [17], Wang [23] and the references 
therein. 

The Moore-Penrose inverse with reference to the involution of secondary- 
transpose (often called s-g inverse) is among a new class of generalized inverses, 
having several interesting properties. Given a rectangular matrix A of rank r, it has 
been noted in Eagambaram et al. [6] that a matrix can be written uniquely as a sum of 
disjoint rank one factors (subject to certain conditions) with reference to star partial 
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order. This decomposition is closely related to singular values of matrix A and the 
associated eigenvectors of AA’. Several interesting properties of star partial order 
studied in the past lead us to introduce a new partial order, in the present paper, based 
on the secondary-transpose (equivalently, with reference to s-g inverse) and probe 
its properties. 

The secondary-transpose was initially introduced on the class of square matri- 
ces by Lee [11]. Following the definition, secondary-transpose on the class of real 
matrices has several interesting combinatorial properties. Further, more spectral prop- 
erties associated with the secondary-transpose are studied by Krishnamoorthy and 
Vijayakumar and introduced the notion of s-g inverse (see Krishnamoorthy and 
Vijayakumar [9] and Krishnamoorthy and Vijayakumar [10]). However, a few exam- 
ples which countered the results presented in this literature lead authors of Varkady 
et al. [20] to reconsider several research problems, and to obtain the results including 
an analogue to spectral decomposition. Noting that s-g inverse need not always exist 
for a given matrix, they provided necessary and sufficient conditions for the existence 
of s-g inverse. Observing that the notion of secondary-transpose could be extended 
to the class of rectangular matrices, in this article, we introduce new relations on the 
class of all real rectangular matrices and study their properties analogous to the case 
of star partial order. 


16.2 Preliminaries 


In this section, we introduce the notations used, definitions of some important gener- 
alized inverses and matrix partial orders. We write R”*” to denote the class of m x n 
matrices over areal field. Let A € IR”*”. We denote by rank(A), Trace(A), A7,C(A) 
and R(A), the rank, trace, transpose, column space and row space of A, respec- 
tively. For matrices X and Y, we write X = Y if C(X) = C(Y) and R(X) = R(X). 


An n Xx m matrix G is called a generalized inverse of A if AGA = A. An arbi- 
trary generalized inverse of A is denoted by A~. An n x m matrix G satisfying 
GAG = G is called an outer inverse of A. A generalized inverse of A which is 
also an outer inverse is called a reflexive generalized inverse of A. The unique 
n X m matrix G satisfying the matrix equations, often referred to as Penrose condi- 
tions, AXA = A, XAX = X, (AX)! = AX and (XA)! = XA iscalled the Moore- 
Penrose inverse of A and denoted by A‘. For more details on generalized inverses, 
readers are referred to Rao and Mitra [19] and Ben-Israel and Greville [4]. 

We refer to Krishnamoorthy and Vijayakumar [9], Krishnamoorthy and Vijayaku- 
mar [10], Vijayakumar [21], Vijayakumar [22] and Varkady et al. [20] for the follow- 
ing definitions. Throughout our discussion, V denotes a square matrix of appropriate 
size (depending on the context), with 1’s on the secondary diagonal and 0’s else- 
where. Given anm x n matrix A = (q;;) over real field, the secondary-transpose A* 
of Ais ann x m matrix with (i, j)-th entry given by 


(A*)i; = GAm—j+i,n—i+1- 
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Evidently, A* = V, A Vm. A square matrix A is called s-symmetric if AX = A. An 
m X n matrix A is said to be s-cancellable if ASAX = ATAY = > AX = AY and 
XAA*’ =YAA® => XA=YA. Amatrix P is said to be an idempotent (projector) 
if P? = P. A projector P is called a s-symmetric projector if P’ = P. The class 
of all s-symmetric projectors with C(P) = M is denoted by P(M). Though we 
consider the matrices over a real field, the results discussed in this paper could be 
extended to the case of matrices over a complex field by replacing A* with its analogue 
A® = V,A* Vn. 

Now, we define the s-g inverse of a matrix by considering the Penrose conditions 
with reference to the involution of secondary-transpose as given in the following. 


Definition 16.1 For an m x n matrix A, consider the following matrix equations: 
(1) AXA=A (2) XAX=X (3) (AX)’=AX (4) (XA)' = XA. 


Ann Xx m matrix G, if it exists, satisfying all of the above equations is called the 
Moore-Penrose inverse of A with reference to the secondary-transpose (also called 
the s-g inverse of A) and denoted by A*. 


It may be noted that the s-g inverse of A, whenever it exists, is the unique reflexive 
generalized inverse G of A such that G ~ A’. Note that if A‘ exists, then A is s- 
SP 


cancellable. It is proved in Varkady et al. [20, Theorem 3.1] that the converse is also 
true. 


Theorem 16.1 (Varkady et al. [20]) Given ann x n matrix A, A* exists if and only 
ifrank(A) = rank(AA*) = rank(A* A) (equivalently A is s-cancellable). Further, in 
the case of rank(A) = 1, A‘ exists if and only if Trace(A’ A) 4 0, in which case, 


% 1 
A’ = ———___A’, 16.1 
Trace(A* A) ( ) 


Whenever rank(A) = r > 1, a determinental formula is given in Varkady et al. [20, 
Theorem 3.5, Eq. 3.5] to compute s-g inverse of A. 

The minus partial order introduced by Hartwig [7] and Nambooripad [18] dom- 
inates most of the other partial orders defined on the rectangular matrices, and we 
provide the definition of the same in the following. 


Definition 16.2 (Minus partial order, Minus partial order, Hartwig [7], Namboori- 
pad [18]) The relation ‘<~’ on the class of real rectangular matrices, defined by 
B<” Aif BGg = AGz and GgB = GgA for some Gz € {B}, is a partial order, 
and the same is called the minus partial order. In the above case, we say that B is 
less than A under minus partial order. 


The star order was initially introduced and studied by Drazin [5] and Hartwig and 
Drazin [8], respectively, without referring to the Moore-Penrose inverse. 
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Definition 16.3 (Star order, Star order, Drazin [5]) The relation ‘<*’ on the class 
of real rectangular matrices, defined by B <* A if BB’ = AB’ and B’B = B’A, 
is a partial order and the same is called the star partial order. In the above case, we 
say that B is less than A under star partial order. 


It was also noted by Drazin [5] that B <* A & BBt = AB‘ and B'B = B'A, 
replacing ‘”’ by ‘’’. Therefore, the minus partial order dominates the star order (i.e., 
B<*A = B< A). 

For more details on matrix partial orders, we refer the readers to Mitra et al. 
[17] and Bapat [3]. The following theorem describes several equivalent conditions 
describing the properties of minus partial order. For the proof, readers are referred 
to Mitra [13]. 


Theorem 16.2 (Mitra [13]) Given the rectangular matrices A, B and C such that 
C=A-B, the following statements are equivalent: 


(i) Band A — B are disjoint matrices, i.e., C(B) 1C(A — B) = {0} and R(B) N 


R(A — B) = {0} 
(ii) rank(B) + rank(A — B) = rank(A) (rank additivity) 
(iii) B<—A 


(iv) {A>} C {BY} 
(v) C(B) ®C(A — B) = C(A) 
(vi) R(B) ® R(A — B) = R(A). 


Later in Eagambaram et al. [6], the star order has been studied with reference to 
column space decomposition and characterized all the matrices B such that B <* A. 

Motivated by the study of star order, in Sect. 16.4, we define a new relation called s- 
order on the class of real rectangular matrices, whichis based on secondary-transpose, 
and a G-based relation called +,-order using the s-g inverse. Though the definition 
of s-order is analogous to the definition of star order, unlike star order, s-order is not 
a partial order on the entire class of real rectangular matrices. So, we investigate how 
the new relation is different or the same as the star order. We identify a subclass of 
real matrices on which the relations s-order and },-order are partial orders. Also, in 
Sect. 16.5, we characterize the column space of factors of given matrices in relation 
to the relations considered, as similar to the case of singular value decomposition 
related to star order. In the process, we obtain a necessary and sufficient condition 
for the existence of rank 1 factor with reference to +,-order for a given matrix. In 
Sect. 16.6, a new decomposition is studied, establishing a relationship between s-g 
inverse of a given matrix and s-g inverses of its components. 


16.3 Projectors and Generalized Inverses 


In this section, we discuss some interesting properties of s-symmetric projectors. It 
may be noted that every subspace S of IR” has an orthogonal projector onto itself. 
But the same is not true in the case of s-symmetric projector. For example, for S = 
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{(a, 0,0)’ € R*: a € R}, the P(S) = . The reason is described in the following 
Lemma. 


Lemma 16.1 Given a subspace S of R" and any matrix E such that C(E) = M, we 
have the following: 


(i) If P,Q € P(M), then P = Q (s-symmetric projector onto a given space is 
unique, whenever it exists) 
(ii) If P € PM), then R(P) = R(E*) 
(iii) There exists P € P(M) if and only if rank(E* E) = rank(E), in which case, 
P= E(E*‘E) E* for any arbitrary choice of (E*E)~. 


Proof For P,Q €P(M), it is trivial that PQ = Q and QP = P as we have 
C(P) =C(Q) = M. Now, both P, Q are s-symmetric which implies that P = 
QP = Q’P* = (PQ)* = Q = Q, proving the part (i) of the lemma. 

From the definition of s-symmetric matrix, we have that P = P’ = VP’ V and 
therefore R(P) = R(P'V) = C(VP). Since C(E) = C(P), we have that R(P) = 
C(VE) = R(E'V) =R(VE'V) = R(E*). Thus, part (ii) of the lemma is proved. 

It may be noted from Lemma 4.3 of Manjunatha Prasad [12] that given matri- 
ces A, X and Y, there exists an outer inverse G such that C(G) = C(X) and 
R(G) = R(Y) if and only if rank(X) = rank(YAX) = rank(Y), in which case 
G = X(YAX) Y for any choice of (YAX)~. Therefore, the part (iii) of the lemma 
follows from the fact that P € P(M) is an outer inverse of the identity matrix J (in 
other words, an idempotent matrix) withC(P) = C(E) andR(P) = R(E*), as noted 
in the part (ii). 


The following theorem characterizes the matrices having s-g inverse. 


Theorem 16.3 Given m x n real matrix A, A‘: exists if and only if both C(A) and 
R(A) have s-symmetric projectors. 


Proof If A‘ exists, it is clear from its definition that AA‘ and (A‘ A)" are the 
s-symmetric projectors onto C(A) and R(A), respectively. 

Conversely, if C(A) has an s-symmetric projector onto itself, then referring to 
part (iii) of Lemma 16.1, we get that rank(A* A) = rank(A). On similar lines of 
argument, if R(A)(= C(A")) has an s-symmetric projector then we get rank(A) = 
rank(A’) = rank(A?* A’) = rank(AA*). Now, the existence of A‘: follows from 
Theorem 3.1 of Varkady et al. [20], given in Theorem 16.1. 


16.4 The s-Order and +,-Order 


We begin this section by defining s-order on the class of real rectangular matrices. 


Definition 16.4 The relation ‘<*’ on the class of real rectangular matrices defined 
by B <* Aif B°B = B*A and BB* = AB‘ is called s-order. 
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In case B*: exists, we have 


BSB= BA => B*(B‘)’ BSB = BY (By BA 
=> (B'BB")B = (B"BB")A °. BB* is s-symmetric 
=> BYB=BrA 
=> (B°B)B’ B= (B°B)B“A 
=> BB = BA: BY ~ B. 
SP 


So, we have BB = B°A if and only if BY: B = B* A. On similar lines, we prove 
BB* = AB’ if and only if BB*s = AB‘, whenever B*: exists. Therefore, 


B<° AS B"B=B"A and BB" = AB*, (16.2) 


for all matrices B for which B* exists. 
The above Eq. 16.2 leads us to define a G-based relation <*: (called +,-order) on 
the class of rectangular matrices defined by B <*: A if B* exists and 


BY: B = BA and BB’ = AB". (16.3) 


From the definition of <*: relation and from the discussion proving Eq. 16.2, we get 
that B <*» A => B <° A. So, <* relation is dominated by <’ relation. 

For convenience, we denote the class of matrices having s-g inverse by R™ and the 
class of all s-cancellable matrices by R’. From Theorem 16.1, we have that R® = R’. 
Further, Eq. 16.2 implies that <° and <": coincide in this class of R° (= R**). In other 
words, 

B< ASB <" A, forall Be RB’ =R”. (16.4) 


For subspaces M and N, we denote by M L, N to indicate that M is s-orthogonal 
to N (for every x € Mand for every y € N, y’ Vx = 0) and by MO,N to denote 
M@®WN with M L, N. It is important to note that ML, N need not imply MN 
N = {0}. The following lemma provides different characterizations for B <" A, 
which is easily verified. 


Lemma 16.2 For A = B + Cand B € R**, the following statements are equivalent: 


(i) C(B)@,C(A — B) = C(A) and R(B)O,R(A — B) = R(A) 
(ii) B’(A— B) = Oand (A — B)B* =0 
(iii) B’B = B’A and BB‘ = AB‘ 
(iv) B<t A 

(v) BtsB = Bt: A and BB*: = AB‘. 


It may be noted from the following example that <* is not a partial order on the 
class of rectangular matrices. 
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Example 1 
For 
001 003 
A=]010} and B=|]010], 
000 000 


it may be noted that A <* B and B <* Aas 


000 
AA‘ = ABS =A A=A'B=BB=BB=|010 
000 


But A # B, showing that <° is not antisymmetric in the class of rectangular matrices. 
In fact, <* is neither a preorder in the class of real rectangular matrices. For example, 
for the matrices 


001 000 004 
A=]010]}, B=|]010], andC=]010], 
000 000 100 


it may be noted that A <* B and B <* C, but A #°* C, and therefore <* is not 
transitive on the entire class of rectangular matrices. 


Theorem 16.4 The relation ‘<°’ (<*:) is a partial order on the class of all rectan- 
gular matrices having s-g inverse (class of all s-cancellable matrices). 


Proof We continue to prove the theorem with the observation that X <° Y= X <i 
Y for all X, Y € R**. For any Aé€ R’*:, it is clear from the definition that A <° A, 
and therefore <* is reflexive. Now, let A, B € R* such that A <* B and B <° A. 
Note that B <° A => BB’ = AB’: => B= BBB = AB*B,whichimplies 
that C(B) C C(A). Considering A <° B = > A‘B = A’ Aand AA® = (AA*®)’ = 
A*s* A’, we get C(B) C C(A) implies that 


A=A™ASA = AV ASB = AAS B=B. 


Hence the antisymmetry of <°. 

To prove the transitivity, consider the matrices A, B,C € IR‘ such that A <* B 
and B <* C. Now, by the definition of <*, we have that AA‘: = BA* and therefore, 
BA‘: A = AA‘ A = A. This proves that C(A) C C(B) or equivalently, R(A°) C 
R(B*). Since B* e BS, R(A®) C R(BS) = > R(A*) C R(B*), and, therefore, 


A’ BBY = A. (16.5) 


On similar lines, by taking AA = A‘ B, we obtain C(A‘") C C(B*) and 
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BUBAY HAY, (16.6) 
Hence, we get 
AVA = AB -A<B 
= At (Bt* BSB) -- B= BB" B=(BB‘)'B 
= AP Bi BSC = (A BBY)C + B <° Cand BY BS = BB 
= ANC, using Eq. 26.4. 


On similar lines, using Eq. 16.6 and AA‘ = BA‘, we get AA’ = CA*. Thus, 
A <° C proving that <° is transitive. Therefore, <° is a partial order on R*: (= R’). 


In view of Theorem 16.4, all the matrices involved in the inequalities with refer- 
ence to <"*, in the future discussion, are from the class R° = R"*, unless indicated 
otherwise. 

We shall now discuss some of the properties of <° and <** in the following 
theorem. 


Theorem 16.5 For any rectangular matrix A and B € R*, the following statements 
hold: 


(i) B <* A implies B <~ A (i.e., <~ dominates <°*) 

(ii) B <* A ifand only if BY <* A* 

(iii) For A € R', B <° A if and only if BY <5 A‘s 

(iv) ForA€R,B< A if and only if A’: isa {1, 3, 4}-inverse of B, in which case 
Bis = Als BA’ 

(v) B <° A ifand only if B* is a {2, 3, 4}-inverse of A, in which case B = AB‘ A 

(vi) For A* exists, B <° A ifand only if C*: exists andC = A — B <‘ A, inwhich 
case C's = A'sCA* and Als = Bs + Ct, 


Proof The statement (i) follows form definition of minus partial order and Eq. 16.2, 
by noting that B* is a particular generalized inverse of B. Statement (ii) follows 
from the definition of <* as given in Definition 16.4. 

Statement (iii) follows from the fact that B = (B**)*: and the equivalent definition 
of <* as given in Eq. 16.2. 

For B <* A, we have from (i) that B <~ A. Now, by (iii) => (iv) of Theorem 
16.2 we have that A‘ is a generalized inverse of B. Further from (iii), we have 
B <° A & B*: <* A‘ and hence A‘ B = B*B and BA* = BB*, which are s- 
symmetric. Thus At isa{1, 3, 4}-inverse of B. Conversely, if A‘ isa {1, 3, 4}-inverse 
of B, then BA‘ and (A‘:B)’ are s-symmetric projectors onto C(B) and R(B), 
respectively. By the uniqueness of the s-symmetric projector onto given space, we 
must have BA's = BB* and A‘ B = B* B and therefore, B <*° A. Now, the second 
part of statement (iv) follows from the fact that A’ is a {1, 3, 4}-inverse of B. 

From definition, B <° 4 => B*B = B*A and BB* = AB* and therefore 
the s-symmetry of AB‘ and B‘ A are trivial. Further, by B‘: B = B* A, we get 
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Bi: AB*: = B*: BB‘: = B* and hence B*: is an outer inverse of A, thus proving 
only part of statement (v). Now to prove the converse part, let B': be a {2, 3, 4}- 
inverse of A. Then, B‘:A and (AB*)” are s-symmetric projectors onto the spaces 
C(B*) and R(B*), respectively. By the uniqueness of the s-symmetric projector onto 
a given space, we get BA = B*: B and AB‘: = BB* and therefore B <* A, which 
proves the if part of statement (v). 

To prove (vi), firstly let A and B be matrices such that A‘: exists and B <* A. 
Using (iii) => (iv), (v) and (vi) of Theorem 16.2, we get BAC = CA” B = O and 
CA~C = C forall A~, so for A‘. In other words, A‘: isa generalized inverse of C 
such that 

BA“ C =CA'B=0. (16.7) 


Since A‘: isa{1, 3, 4}-inverse of B as proved in statement (iv), we get that A’sC A* is 
thes-ginverseofC, satisfying CC’ = AC*: andC'C = Ct AforC’s = ASCA™ = 
A‘: — B*s, This proves that C <° A. The converse part is symmetric. 


Remark 16.1 As noted in Theorem 16.5 (i), the s-order is dominated by minus 
partial order. However, the s-order neither dominates the star order nor it is dominated 
by the star order. This can be seen from the following examples. 


Example 2 
For 
101 101 
A={|010] and B=|-21 2 |}, 
010 2 1-2 


we have rank(A A*) = rank(A* A) = rank(A) and B is invertible. So, by Theorem 
16.1 it follows that A, B € R**. Further, we observe that 


200 101 
AA’ =|011]| =BA’ and A7A=|020] =A’B, 
O11 101 
and hence A <* B. But we observe that 
010 2 1-2 
AA=J111]4]-11 3 | =A‘*B, 
010 2 1-2 


which shows that A <° B. 
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Example 3 


Now, consider the example of 


101 00 2 
A=|]010)} and B=j]11-1], 
010 010 


and observe that A,B € R*® as rank(AA®*) = rank(A*A) = rank(A) = 2 and 
rank(B B*) = rank(B* B) = rank(B) = 3. Further, we have 


002 010 
AA*=]110]=BA* and AA=] 111) =A*B. 
110 010 


Therefore, A <* B. But in this case 


101 00 2 
ATA=|020| #]12-1] =A7B, 
101 00 2 


which shows that A <* B. 


Remark 16.2 It is well known that if A <* B, then C = A — B <* A. Though the 
same is true in the case of <* for matrices in the class R*:, it is not the case in 
general. It may be noted that Theorem 16.5 (vi) may not remain on hold if we relax 
the condition that A‘ exists. This can be seen in the following example. 


Example 4 
Consider the matrix 
000 
A=1]010], 
100 


for which rank(A) = 2 4 1 = rank(A*A) = rank(AA*) and hence A‘ does not 
000 

exist. Now, for B = | 0 10] we have B € R*: and B <* A, as 
000 


000 
BB’ =AB°=BB=BA=|010 
000 
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000 

Further, we have C = A— B= | 000). Note that C* does not exist as C°C = 
100 

O = > rank(C*C) =04 1 =rank(C). 


Remark 16.3 Given a complex matrix A, the conjugate transpose of A denoted by 
A* is well known in the literature. Similarly, the involution of conjugate secondary- 
transpose on the class of complex rectangular matrices is defined by A° = V A*V, for 
any complex matrix A. The 6-order on the class of all complex rectangular matrices 
is defined on similar lines of s-order by replacing ‘s’ with ‘6’ in the definition. The 
properties of 6-order are similar to those of s-order discussed in this section. 


16.5 Factors with Reference to the Order Relations 


In Eagambaram et al. [6], it is noted that for a given rectangular matrix A and a 
proper subspace M of C(A), there always exists a matrix B with C(B) = M such 
that B <~ A. Such choices of matrix B are infinitely many. Further, the authors of the 
same paper have characterized the subspace M of C(A) for which there exists matrix 
BwithC(B) = Mand B <* A. Further, note that such a matrix B, whenever it exists, 
is unique. Motivated by this, in this section, we probe the problem of characterizing 
the matrices B € R*:, if it exists, such that B <° A with C(B) = M, where A is any 
arbitrary rectangular matrix, not necessary that A € R’:. In general, such a B does 
not exist, which can be seen from Example 5 below. 


Example 5 


2-1 


Consider the matrix A = 2 ) 


(invertible) and M = span| H \. Any matrix 


B with C(B) = M is of the form B = 


. nd , where a and b are real numbers, 


b 

6 ys : 5 _ [4ab 4b? _ |03b] _ 4, 
not both are zeros. If B <* A, in which case, B* B = be Fe = lo | = BA, 
thena = Oandb= 3. In this case BB’ = 0 £ AB’, a contradiction. 


In the following theorem, we prove that the matrix B € R* such that B <*° A 
with C(B) = M, if it exists, is uniquely determined like in the case of star order. 


Theorem 16.6 If B,C € R* such that B,C <° AandC(B) = C(C), then B = C. 


Proof SinceC(B) = C(C), from Lemma 16.1 the s-symmetric projectors onto C(B) 
and C(C) given by BB*s and CC*s, respectively, are the same. Therefore, 


328 U. Kelathaya et al. 


B= BBB 
=(BB")A -- B<'A 
=CC*A + BBY =CC* 
=C(C™C) = Ce A 
=C, 


proving the theorem. 


The following theorem provides a few necessary and sufficient conditions for the 
existence of a matrix B <* A such that C(B) = M C C(A). 


Theorem 16.7 For any rectangular matrix A and M © C(A), consider the follow- 
ing statements: 


(i) There exists a matrix B € R*: such that B <° A with C(B) =M 
(ii) There exists P € P(M) such that PAA‘ (I — P) =0 
(iii) There exists P € P(M) such that P commutes with A A* 
(iv) There exists P € P(M) such that P AA® is s-symmetric. 


Then (i) => Gi) => Gii) => (iv) holds. Further, (iv) => (i) holds if A exists. 


Proof To prove (i) => (ii), consider B € R* as given in (i). Now, set P = BB", the 
s-projector onto the space M. By substitution, we get PA = B(B* A) = BB’ B = 
B,as B <° A.Now, B<° A => BB’ = BA’ => PAA‘’P = BB = BA’ = 
PAA’ proving (ii). 

(ii) => (iii) is trivial, by noting that PAA* P is s-symmettric. 

Now let P € P(M) as given in (iii). Then PAA*® = AA*P = (PAA‘)’, proving 
(iv). Hence (iii) => (iv). 

Now, let P be the s-projector onto the space M satisfying (iv), ie., PAA® is 
s-symmetric. Set B = PA. Observe that 


C(B) CC(A) and R(B) C R(A), (16.8) 
as C(B) = M=C(P) Cc C(A). Further, 
BBS = PAA‘ P = (PAA‘) P = AA’ PP = AA‘ P = ABS (16.9) 


and 


BSB = ASPPA=A‘PA=A‘B = B°A, (16.10) 


which show that B <* A. Now, to complete the proof of (iv) => (i), we need to 
prove that B € R*:, whenever A € R*. From Eq. 16.8, we get C(B°) C C(A’) and 
using Eq. 16.9 we get that A’: BB’ = A‘: AB* = BY (because A™ - A®), proving 
that rank(B B*) = rank(B*) = rank(B). Similarly, from Eqs. 16.8 and 16.10, we get 
that rank(BB*) = rank(B). Now, referring to Theorem 16.1, we get that B € R*. 
Hence, (iv) => (i). 
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Remark 16.4 If we drop the condition that A‘: exists in (iv) of Theorem 16.7, then 
(iv) may not imply (i), as seen in the following example. 


Example 6 
0000 0 
Consider the matrix A=|]1100] and M= span| 1 CC(A). The s- 
1000 0 
000 
symmetric projector P onto M is given by P = | 0101]. Then P commutes with 
000 
0000 
AA’. Further, we have that B = PA = | 1 1 0 0 J, for which it is easily verified that 
0000 


B <° A. However, note that B ¢ R* as rank(B’B) = 1 4 0 = rank(BB*). There- 
fore, (iv) does not imply (i) for the matrix A and the space M. 


The following theorem characterizes the subspaces M of C(A) for which there 
exists matrix B with C(B) = Mand B <* A. 


Theorem 16.8 For any rectangular matrix A and M © C(A), the following state- 
ments are equivalent: 


(i) There exists a matrix B such that B <*s A with C(B) =M 
(ii) There exists P € P(M) such that PAA*(I — P) =0 and rank(AA‘ P) = 
rank(P AA*) = rank(P) 
(iii) M is an invariant subspace of AA* having s-symmetric projector (i.e., 


AA’ (M) = Mand P(M) & 9). 


Further, in the above case matrix B is given by 
B=E(E‘E) E*A (16.11) 
for any choice of (E*E)~. 


Proof For the matrix B satisfying (i), the existence of P € P(M) such that 
PAA‘ (I — P) = Ois proved as in the case of (1) => (ii) of Theorem 16.7, in which 
case P = BB's. The second part of (ii) of the theorem follows from the fact that B 
has s-g inverse, in which case rank(B B*) = rank(B) = rank(P). Hence (i) => (ii). 

If P is s-symmetric projector satisfying (ii) of the theorem, then for the matrix 
B= PAinwhichcaseC(B) = C(P), B <° Ais proved as in the case of (iv) => (i) 
of Theorem 16.7. Since P is an s-symmetric projector onto C(B) = C(P), follow- 
ing from Lemma 16.1 (iii) we get that rank(B* B) = rank(B). Now, rank(BB*) = 
rank(B) follows from the fact that PAA* P = AA* P and rank(AA’ P) = rank(P), 
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as given in (ii). Therefore, B*: exists and hence it follows from Eq. 16.4 that B <*: A. 
Hence (ii) => (i). 
Note that 


PAA‘ P = AA‘’P = PAA’ & C(AA‘’P) =C(P). 


Further, the equality on the right-hand side holds if and only if rank(AA*P) = 
rank(P), thus proving (iii)<>(ii). 

Further, Eq. 16.11 follows from the expression for s-symmetric projector as given 
in Lemma 16.1 (iii). 


Remark 16.5 If we relax the condition of rank(AA* P) = rank(PAA*) = rank(P) 
in (ii) of Theorem 16.8, then (ii) => (i) may not remain on hold. To observe this, 
consider the following example. 


Example 7 
00010 0 1 
00100 0 0 
For matrix A= | 10000] and M = span 1},]0 C C(A), we have that 
00000 0 0 
00010 0 1 
$0005 
00000 
P=|00100] € P/M). Observe that rank(A A’ P) = rank(PAA‘) =042= 
00000 
50003 
00010 
00000 
rank(P). Now, for B = PA = | 10000}, itis easily verified that B <* A. How- 
00000 
00010 


ever, note that B ¢ R', as rank(BB*) = 0 4 2 = rank(B*B). Therefore, for the 
matrix A and space M, (ii) ==> (i) of Theorem 16.8 does not hold. 


The following theorem characterizes the rank 1 factors under <*: of a given matrix 
and the proof follows as a corollary to (i)<(iii) of Theorem 16.8. 


Theorem 16.9 For any rectangular matrix A and a vector 0 # x € C(A), there 
exists a matrix B € R* such that B <* A and C(B) = span{x} if and only if x is 
an eigenvector of AA* and span{x} has s-symmetric projector. 


Remark 16.6 Observe that for a given eigenvector x of AA’, the existence of a 
matrix B <* A (or B <° A) such that C(B) = span{x} is not guaranteed. This can 
be seen in Example 8 below. 
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Example 8 
001 1 
Consider the matrix A= |010], for which x =] 0J] is an eigenvector of 
100 0 
AA’ = 1. Any matrix B such that B <': with C(B) = span{x} is of the form 
abc 
B={|000}], not all a,b,c are zeros. Clearly, B <ts A is not possible, unless 
000 


a,b,c are zeros, which is a contradiction. Similarly for B <* A, in which case, 
B*B = B*A and BB* = AB* implies that a, b,c = 0, a contradiction to the fact 
that C(B) = span{x}. 


Remark 16.7 For a given rectangular matrix A, Theorem 16.9 does not assure the 
existence of any eigenvector x of AA* for which span{x} has s-symmetric projector 
onto itself. To observe this, we consider the following example. 


Example 9 
100 110 

Consider the matrix A = |00 1], for which AA* = | 001 |. The eigenvalues 
001 001 


of AA® are 0 with multiplicity one and | with multiplicity two. The eigenspace 

corresponding to the eigenvalue 1 has dimension | and is spanned by the vector 
1 

x = | 0] for which span{x} does not have s-symmetric projector, and hence there 
0 

is no matrix B of rank 1 such that B <** A. Further, there is no matrix B of rank 1 

with C(B) = span{x} such that B <*, as seen in the earlier example. 


16.6 A New Decomposition 


The spectral decomposition of a symmetric matrix is very useful in computing Moore- 
Penrose inverse, but the analogous decomposition given in Theorem 2.1 of Varkady 
et al. [20] for an s-symmetric matrix is not so promising in computing s-g inverse, 
as seen in the following example. 
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Example 10 
010 035 
For — matrix A= |001], the decomposition A=(1)]/0 5 5 
000 000 
11 
0-7 3 
+(-lD 1/0 5 —5 as given in Theorem 2.1 of Varkady et al. [20] is in the form 
00 0 


A1A1 + A2Az2, where A; and Az are s-eigenvalues of A and A, Az are s-idempotent. 
Further, A; and Az have s-g inverses, as rank(A; A;) = rank(A;A;) = rank(A;) = 1, 
for i = 1,2. By using the expression for s-g inverse of rank 1 matrix as given in 


022 0-2 2 
Theorem 16.1, we get AS = 022) and Als = 0 2 —2], but A has no s-g 
000 00 O 


inverse. In fact, the matrix i AP + iA = ayats + (nal is not even a 
generalized inverse of A, as we observe that 


004 
A(ATS + (-DAS)A=|000| ZA. 
000 


This observation leads us to explore, in this section, a useful decomposition which 
could relate s-g inverse of a given matrix with s-g inverses of its components. 


Theorem 16.10 If A;, Ao, ..., Ax are any matrices such that 
k 
A=) °Aj, and AjA;' =0,A;'Aj; =0 fori # j, 
i=l 


then the following statements are equivalent: 


(i) A‘: exists and S~t_, rank(A;) = rank(A) 
(ii) A,"* exists for alli =1,2,...,k. 


In the above case, we have 


k 
Av = > Ais and A; = Av A;A™. (16.12) 


i=1 


Proof Let A= ean A; such that Aj;A;* = 0 and A;*A; = 0 for i  j as given 
in the theorem and satisfy (i). Now to prove (i) => (ii), we shall prove that 
rank(A} A;) = rank(A;A;) = rank(Aj), for each i. 

Since A‘: exists, we have that rank(A A*) = rank(A* A) = rank(A). Therefore, 
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k 
rank(A) = rank(AA‘) = rank ( AiA‘) (: AiAS =0) 
i=1 


k 
< > rank(A; A’) 


i=1 


k 
< > rank(A;) = rank(A), from rank additivity given in (i). 


i=l 
Therefore, we get 


k k 
)\ rank(A;A}) = 5° rank(A;), (16.13) 


i=l i=l 


which in turn implies that rank(A;A;) = rank(A;). Similarly, we prove that 
rank(A; A;) = rank(A;). Thus, Ale exists for each i and hence (i) > (ii). 

Now let A;"* exist for each i, satisfying (ii). Equation 16.12 is easily verified 
by substitution and using the fact that Aj; = A,’ and given that Aj Aj = 0 and 


A;‘A; =0 fori 4 j. Thus, A‘ exists and it is a generalized inverse of A; satisfying 
A; A‘ A; = 0, for each i and j #i. 

Now, for i = 1 we have that G = At: A,;A™ € {A,_} such that A;G = AG and 
GA, = GA. Referring to definition of <~, we have that Aj <~ A and then 


rank(A) = rank(A — A,) + rank(A1) 


follows from (ii)<(ii) of Theorem 16.2. Now, noting that A — A; = A2+ A3+ 
+++ + Ax such that Aj A; = 0 and A;A¥ = 0 fori, 7 # 1,1 4 j, Gi) => ( follows 
by induction. 


Remark 16.8 It may be noted that (ii) of Theorem 16.10 does not hold, if we drop 
the condition of rank additivity in (i), which can be seen from the following example. 


Example 11 
Consider the matrix 
0101 
0010 
= loaned 
0000 


which is an s-symmetric matrix and satisfies the condition rank(A*A) = 
rank(A A‘) = rank(A?) = 2 = rank(A). Therefore, A‘ exists. Now, consider a 
decomposition 
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0100 0001 
0010 0000 
A= 0101 + 0000 = A; + A2( say), 


0000 0000 


which satisfies the condition given in Theorem 16.10, but not satisfying the rank 
additivity property given in (i). It may be noted that (ii) does not hold as we observe 
that A>" does not exists. 


The matrix decomposition A = A; + Az? as given in the above example to demon- 
strate that A; <*° A and A® exists need not imply AV exists. 
The following example demonstrates Theorem 16.10. 


Example 12 


Consider 


with a decomposition 


j=) 


ooococo 


00000 
00000 
00100] =A;+ A> (say), 
00000 
00000 


— 
oro 

ooooco 
+ 


> 
II 
ocooooco 
NS) 


oro! 


satisfying the condition 0 = A,* Az = A, A2*, given in Theorem 16.10. 
We also observe that rank(A) = 3 = rank(A*A) = rank(AA‘). Therefore, A™ 
exists. Further, we have 


3 = rank(A) = rank(A,) + rank(A2) = (24+ 1), 


and hence the decomposition satisfies (i) of the theorem. In fact, using the expression 
given in Eq.3.5 of Varkady et al. [20, Theorem 3.5] for the s-g inverse, we get that 


00000 
0=—20-10 
Av=100100 
0-10-10 
00000 
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It is verified easily by substitutions that the matrices A,*A, and A; A,’ are of 
rank two, and the matrices A>" A> and A>A>° are of rank one. Therefore, A," and 


00000 00000 
0-20-10 00000 

A>" exist. In fact, A)" = |0 0 0 0 O| and Ao = |00100 |. Thus, (ii) of 
0-10-10 00000 
00000 00000 


Theorem 16.10 is satisfied. Further, we observe that A** = A," + Ad". 
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Chapter 17 ®) 
On Products of Graph Matrices cro 


G. Sudhakara, Vinay Madhusudanan, and K. Arathi Bhat 


Mathematics Subject Classification (2010) 0O5C50 - 05C76 


17.1 Introduction 


All graphs considered in this article are assumed to be simple and undirected. 
Let G be a graph with vertex set V(G) = {v1, U2,..., Up} and edge set E(G) = 
{e1, €2,...,@m}. Then n is the order and m is the size of G. Two vertices v; and 
v; are adjacent in G if there is an edge between them, and this is denoted by 
vj; ~G vj. If vu; and v; are non-adjacent in G, we write vj *G vj. The complete 
graph on 7 vertices, denoted by K,,, is the graph in which every two vertices are 
adjacent, and its complement K., is called the totally disconnected graph. If G and 
H are two graphs, then G U H is the graph with V(G U H) = V(G) U V(#) and 
E(GUA) = E(G) U E(A). A cycle graph on n vertices, denoted by C,,, is a con- 
nected 2-regular graph. An isolated vertex is a vertex with degree zero, and a full 
degree vertex is a vertex that is adjacent to all other vertices. A tree on n vertices is 
a connected acyclic graph. An end vertex or a pendant vertex is a vertex of degree 1. 
A path graph is a tree with exactly two end vertices. 
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Let G be a graph with V(G) = {v1, v2,..., v,} and E(G) = {e1, e2,..., em}. 
The adjacency matrix A(G) of G is the n x n matrix whose rows and columns are 
indexed by the vertex set of G and whose (i, j)-entry, a;;, is given by 


0 if v; MG Uj 

i= J 
i ‘ 

1 if v; ~G Uj. 


Thus, A(G) is asymmetric (0, 1)-matrix whose diagonal entries are all equal to zero. 

The incidence matrix of G, denoted by B(G), is the n x m matrix whose rows 
are indexed by the vertex set and columns are indexed by the edge set of G. The 
(i, j)-element of B(G) is given by 


b " if e; is not incident with the vertex v; 
i 


1 if e; is incident with the vertex v;. 


Then x m matrix in which every entry is 1 is denoted by J... A permutation matrix 
of order n is ann x n matrix which contains exactly one | in each row and one | in 
each column. 


17.1.1 Product of Graphs (Graph Product) 


A graph product is a binary operation defined on the class of graphs. It is useful in 
constructing a new graph from given graphs. 

Let G,; and G2 be two graphs with disjoint vertex sets {u,,u2,...,Um} and 
{U1, V2,..., Un}, respectively. A graph product of G; and G2 is a new graph whose 
vertex setis V(G,) x V(Gz), the Cartesian product of V(G,) and V(G2). The adja- 
cency of two distinct vertices (u;, v;) and (u,, vs) in the product graph is determined 
entirely by the adjacency/equality/non-adjacency of u; and u, in G, and that of v; 
and v, in Gz. Thus, one can define 256 = 2° different types of graph products. 

Section 17.1 introduces the basic definitions and concepts. In Sect. 17.2, results on 
matrix product of graphs are discussed. These results are taken from the article [6]. 
Section 17.3 deals with the realization of A(G)A(H) where the multiplication is 
taken modulo 2. Results of this section are taken from the article [7]. Some equations 
involving matrices related to graphs are explored in Sect. 17.4. Results of this section 
have appeared in the article [4]. Discussion about the product of adjacency matrices 
of a graph and its generalized complement is given in Sect. 17.5, and results of 
this section have appeared in the article [3]. In Sect. 17.6, an algorithm which tells 
whether a given graph G has a companion or not is given, and the algorithm has 
appeared in the article [5]. 

Readers are referred to [9] for all the elementary notations and definitions not 
defined but used in this chapter. 
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17.2. Matrix Product of Graphs 


A graph and a matrix representing the graph uniquely (up to isomorphism) can be 
treated as identical objects in the sense that the properties of one object can be 
studied using the properties of the other. It is so much so in the case of a graph and 
its adjacency matrix that the eigenvalues of a graph are nothing but the eigenvalues 
of its adjacency matrix. So, when graph products (graph operations) are defined to 
obtain a new graph, it is natural to ask what the adjacency matrix of the resulting 
graph is, in terms of the adjacency matrices of the factors. For a complete picture of 
the eigenvalues of the product graph in terms of the eigenvalues of its factors, one 
may refer to the beautiful survey article [2]. In this section, we address almost the 
reverse of this problem. Instead of asking for the adjacency matrix of the product, 
we ask for the graph (whenever one exists) representing the product of adjacency 
matrices of given graphs. 

Since the adjacency matrix of a graph is a symmetric (0, 1)-matrix with zero 
diagonal, the following definition is natural. 


Definition 17.1 (Graphical Matrix, [1]) A symmetric (0, 1)-matrix is said to be 
graphical if all its diagonal entries are equal to zero. 


Note that a graphical matrix is the adjacency matrix of a graph. When B is graph- 
ical and is equal to A(G), the adjacency matrix of some graph G, then we say that G 
is the realization of the graphical matrix B. We are interested in studying the cases 
where the product of adjacency matrices is graphical. In fact, we are interested in 
answering the following question: Given a graph G, when is it true that there exists 
a graph H such that A(G)A(#) is graphical? We call such a graph H a companion 
of G. 

Let G be a graph on n vertices and H be the graph K,,. Since A(K;,) = Onxn, the 
zero matrix of ordern x n, A(G)A(H) = Oy x, and the realizing graph is K,,. Hence, 
K,, is a companion of every graph of order n. Also, an isolated vertex in either G or 
H results in an isolated vertex in the graph realizing A(G)A(H), when A(G)A(H7) 
is graphical. So, for further discussion in this chapter, we consider graphs G and H 
which are neither totally disconnected nor contain any isolated vertex. 

In characterizing the pairs of graphs for which the matrix product exists, we will 
make use of the concept of a G-H path, defined below. 


Definition 17.2 (G-H Path) Given graphs G and H on the same set of vertices 
{v1, U2, ..., Un}, two vertices v; and v; are said to have a G-H path between them if 
there exists a vertex v; (necessarily different from v; and v;) such that vj ~G vz and 
ve ~H vj; (see Fig. 17.1). 


The name of a graph occurring beside an edge in the following as well in later 
figures does not represent the label of the edge, but instead represents the graph to 
which the edge belongs. 

Note that a G-H path between two vertices v; and v;, whenever one exists, is a 
path v;v,v; of length two in G U H. The same G-H path also represents an H-G 
path v;v,v; between v; and v; through uz. 
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Uk 


VU; Uj 


Fig. 17.1. G-H Path 


Theorem 17.1 Let G and H be graphs on the same set of vertices {v,, V2, ..., Un}, 
with adjacency matrices A(G) and A(f), respectively. Then C = A(G)A(A) is 
graphical if and only if all of the following hold: 


(i) Gand H are edge disjoint. 
(ii) For every two vertices v; and v;, there exists at most one G-H path from the 
vertex v; to the vertex vj. 
(iii) For every two vertices v; and vj, if there exists a G-H path from the vertex v; 
to the vertex vj, then there exists an H-G path from v; to v;. 


Proof Suppose that C = A(G)A(#) is graphical. Since the diagonal entries of C 
are all zero, the ith row of A(G) and the ith row of A(#) are orthogonal for every 
i, 1 <i <n. That is, G and H are edge disjoint. Note that the (i, j)-entry in C is 
nothing but the number of G-H paths from v; to v;. Since the (i, j)-entry of C is 
either O or 1, there exists at most one G-H path between v; and v;. Finally, since 
C is symmetric, whenever there is a G-H path between v; and vj, there is also 
an H-G path between v; and v;. The converse is verified easily using matrixbreak 
multiplication. 


The above result leads to the following definition. 


Definition 17.3. (Matrix Product of Graphs) Given graphs G, H, and T on the same 
set of vertices, the graph I’ is said to be the matrix product of graphs G and H if 
A(G)A(A) is graphical and A(T‘) = A(G)A(#). Then we say that G and H are the 
graph factors of the matrix product I. 


Remark 17.1 Let G and H be graphs on the same set of vertices and I" be the matrix 
product of G and H. For any two vertices v; and v;, the G-H and H-G paths from 
v; to v;, when they exist, are not through the same vertex. Also note that the vertices 
v; and v; are adjacent in I’ if and only if G U H has the following structure given in 
Fig. 17.2. 


We shall now prove that the product graph I of G and H always has an even 
number of edges. To do this, we need the following definition. 


Definition 17.4 Given two disjoint edges e = (x, y) and f = (u,v) in a matrix 
product T' of the graphs G and H, f is said to be parallel to e if xuy is a G-H path 
(H-G path) and xvy is an H-G path (G-H path) between the vertices x and y, as 
shown in Fig. 17.3. 
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Theorem 17.2 Given an edge e = (x, y) in the matrix product V of the graphs G 
and H, the following statements are true: 


(i) There exists an edge f = (u, v) inT, disjoint from e, such that f is parallel to 
e. 
(ii) If f is parallel to e then e is parallel to f. 
(iii) The edge f parallel to e is unique. 


Proof (i) Let e = (x, y) be an edge in I’. Since I" is a matrix product of G and 
H, there exist vertices u and v in I such that 


xX~GUuU~HY, and x~ypU~cy. (17.1) 
But this implies that 
Uu~GX ~H 0, and u ~HY™~G DU. (17.2) 


In other words, (u, v) = f isanedgeinT and, from the definition of the parallel 
edge and from Eq. (17.1), f is parallel to e. 

(ii) From the equivalence of Eqs. (17.1) and (17.2), we see that f is parallel to e if 
and only if e is parallel to f. 

(iii) Suppose that f = (u,v) and f’ = (u’, v’) are two edges parallel to e in I. 
Then x ~Gu~yy,X ~HVU~G y andx~gu' ~yq y, X ~q U' ~G y (see 
Fig. 17.4). 


Hence, we get two G-H paths xuy and xu’y between x and y and similarly, 
we get two H-G paths xvy and xv’y between x and y. This contradicts (ii) of 
Theorem 17.1. This proves the uniqueness of the parallel edge f. 
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Fig. 17.3. Parallel edges in T 
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Fig. 17.4 Two G-H and H-G paths between x and y 


Theorem 17.3 [fT is the matrix product of graphs G and H, then T has an even 
number of edges. 


Proof From Theorem 17.2, it is clear that every edge (x, y) inT has a unique parallel 
edge and the relation ‘is parallel to’ is symmetric. The relation is not an equivalence 
relation, but it partitions the set of edges of I’ and each part consists of exactly two 
edges that are parallel to each other. Hence, the number of edges in I is even. 


Let G, H, and TI be graphs defined on the same set of vertices, and let I’ be the 
matrix product of G and H. Then it is natural to expect a formula for the degree of 
any vertex in I in terms of its degree in G and a degree in H. We address this in the 
next theorem. 

First we prove the following lemma. 


Lemma 17.1 Given graphs G, H, and T on the same set of vertices such that T is 
a matrix product of G and H, for each vertex v; of T, 


deg vj = Y> degg vj. (17.3) 


Uj~HUi 


Proof Recall that every edge in the matrix product I corresponds to an H-G path. 
Thus, the number of edges of an incident with v; is equal to the number of H-G 
paths of the form uv; ~y vj ~G vg, for some vertices v; and uv, of I. The result 
follows. 


Note 17.1 Since the adjacency matrix of is symmetric, A(G) and A(H) commute 
and therefore 


deg. uj = S> degy vj. (17.4) 


Uj~GU 


Note 17.2. Let I’ be the matrix product of G and H. If there is a path between the 
vertices v; and v; in G, then deg;, v; = deg, v;. 


Theorem 17.4 Let G, H, and TV be graphs such that V is the matrix product of G 
and H. Then for each vertex v; of T, 


deg uj = deg, vu; - deg, uj. (17.5) 
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Proof From Lemma 17.1, we have degp v; = > degg v;. In the summation on 
Uj~HU 
the right-hand side, we consider all j such that vj; ~y v;. Therefore, from Note 17.2, 
we get 
degg uj = degg v; 


for all j with v; ~y v;. Thus, 


degr uj = >» degg vj; 


Ui~HY; 


(deg, v:) > 1 


Uji~ HU; 


= deg, vu; - deg, v;. 


Note 17.3 Let the graphs G and H be defined on the same set of vertices 
{v1, V2,..., Un}, and let ’ be the matrix product of G and H. If G and H are regular 
graphs with regularities r and s respectively, then I is regular with regularity rs. 


The following is a corollary to Lemma 17.1. 


Corollary 17.1 Let Tbe the matrix product of the graphs G and H. If one of the 
graphs G and H is connected, then the other graph is regular. 


Proof First, we shall prove that deg, v) = deg,, v2 for every pair of vertices v; and 
v2 adjacent in G, i.e., v; ~G v2. For this purpose, we show that there is a one-to-one 
correspondence between H-adjacencies of v; and H-adjacencies of v2. The case 
deg;, v1 = deg;, v2 = | is trivial. 

To consider the non-trivial case, let deg,, v2 > 2. Consider any two vertices v3 
and v4 that are adjacent to v2 in H. Since vj ~G v2 ~H V3 and vy ~G V2 ~yH V4 are 
G-H paths and I is the matrix product of G and H, we see that the vertices v3 and 
v4 are adjacent to v; inl’. 

Now v; ~r v3 implies that, as in Remark 17.1, there exists an H-G path v;} ~yq 
Us ~G v3 through a vertex vs 4 v2. Similarly, v; ~p v4 implies that there exists an 
H-G path v; ~y v6 ~G v4 through a vertex ve # v2. Hence, the vertices vs and 
Ue adjacent to v; in H respectively correspond to the vertices v3 and v4 adjacent 
to v2 in H. Now, we shall prove that vs 4 ve in order to establish the one-to-one 
correspondence between adjacencies of v, and v2 in H. 

Suppose that vs = ue. Then, vs is adjacent to v3 and v4 in G, and both v3 and 
v4 are adjacent to v2 in H. Then there are two G-H paths vs ~gG v3 ~y v2 and 
Us ~G V4 ~H V2 from vs to V2 (see Fig. 17.5), which contradicts (ii) of Theorem 17.1. 

This proves that deg,, v; = deg, v2, whenever v; and v2 are adjacent in G. Thus, 
if G is connected, there exists a path between every pair of vertices and the degrees 
of all vertices of H are identical. In other words, H is regular. 
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Fig. 17.5 G-H paths between vs and v2 


Note 17.4 From the proof of Corollary 17.1, the following is evident. If the matrix 
product of G and H exists, then vertices v; and v; are adjacent in one of the graphs 
G and H only if they are of the same degree in the other. 


Note 17.5 Let I be the matrix product of G and H. Suppose that one of G and H, 
say G, has a full degree vertex v. Then G is connected, and by Corollary 17.1, 1 is 
regular. We also know that H is a subgraph of G and G has v as a vertex of degree 
zero. Hence, all the vertices of H are of degree zero, i.e., H is trivial, and sois T. 


Now we define the concept of degree partition of a graph, which will be useful in 
establishing some further results regarding vertex degrees in the matrix product. 


Definition 17.5 (Degree Partition of a Graph) For a graph G on the set of vertices 
{v1, V2,..., Un}, the degree partition of the graph G is a partition {Vo, Vi,..., Ve} 
of the vertex set V(G) with V; = { v; | degg 5 = 7 },0 < j <kandk = A(G) = 
max {degg vj}. 

Note 17.6 Let I be the matrix product of G and H. From the definition of the degree 
partition of graph G and by Note 17.2, it is clear that every component of H lies 
entirely in one part of the degree partition of G. 


Corollary 17.2 Let G, H, and T be graphs such that Y is the matrix product of G 
and H. Then the following hold: 


(i) If any two of G, H, and T are regular, then so is the third. 
(ii) T is regular if and only if degg v; - deg, vj is constant for alli, 1 <i <n. 
(iii) A connected graph Y is Eulerian if and only if the degree of each vertex is even 
in at least one of the graphs G and H. 


Corollary 17.3 [fT is the matrix product of G and H, where G or H is a regular 
graph, then T cannot be a tree. 
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Proof Suppose that I is a tree. Then I has a pendant vertex, say v. By Theorem 17.4, 
deg, v - deg, v = degy v = 1, which implies that deg, v = deg, v = 1. Without 
loss of generality, let H be regular. Then H is 1-regular and hence contains an even 
number of vertices. Thus, I" is a tree on an even number of vertices and hence contains 
an odd number of edges. This contradicts the fact that C has an even number of edges 
(see Theorem 17.3). 


Theorem 17.5 A tree has no non-trivial companion. 


Proof Let G be a tree on n vertices. We prove the result by induction on n. The 
result is obvious when n = | or 2. Now, suppose that the result holds for all trees of 
order less than n > 2, and let G be a tree of order n. 

Let P be the set of all pendant vertices of G, and note that |P| > 2. Let G’ = G\P 
be the tree obtained by deleting all pendant vertices of G. Suppose, for the sake 
of contradiction, that G has a non-trivial companion H. As G is connected, H is 
regular, and hence has no isolated vertices. Therefore, if u is a pendant vertex of 
G, then u ~y v for some vertex v, and by Note 17.4, v is also a pendant vertex of 
G. Now, let x and y be the vertices adjacent to u and v, respectively, in G. Then 
xX ~G u~y visaG-H path from x to v, and hence there exists a G-H path from v 
to x, which is necessarily v ~G y ~y X, Since y is the unique vertex adjacent to v 
in G. Thus, x ~# y, which shows that H’ is non-trivial (noting that x and y cannot 
be pendant vertices of G). The preceding argument also shows that if u is a pendant 
vertex of G, then for each edge (u, y) in the matrix product of G and H, the unique 
parallel edge of (u, y) also has a pendant vertex of G as one of its end vertices. 

To see that H’ is a companion of G’, we verify that the three conditions given in 
Theorem 17.1 hold for G’ and H’. Conditions (i) and (ii) obviously hold by virtue 
of the fact that G’ and H’ are subgraphs, respectively, of G and H, and H is a 
companion of G. For (iii), let s and tf be any two vertices of G’, and suppose that 
there exists a G’-H’ paths ~q p ~ t froms tot. We need to show that there exists 
a G’-H' path from t to s. Note that s, t, and p are non-pendant vertices of G, and 
hence s ~g p ~# t isa G-H path from s to t. Since H is a companion of G, (iii) of 
Theorem 17.1 implies that there exists a G-H path t ~g q ~y s from t to s. Then 
(s, q) and (f, p) are parallel edges in the matrix product of G and H. As observed 
earlier, any edge of the matrix product that has a pendant vertex of G as one of its 
end vertices must be parallel to another edge with the same property. Since s, t, and 
p are non-pendant in G, it follows that g is also not a pendant vertex of G, and hence 
is present in G’. Hence, t ~q gq ~ S$ isa G'-H’ path, as required. 

Thus, H’ is a non-trivial companion of G’, a tree of order less than n, which 
contradicts the induction hypothesis. 


Note 17.7 Let G, H, and I be graphs on the same set of vertices, and let I be the 
matrix product of G and H. Let m,, m2, and m3 denote the number of edges in G, 
H, and IT, respectively. Then, by Theorem 17.4, 


> degr uj = 2 degg v;. deg y, vj. 
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Let G be a connected graph and r be the regularity of H. Then the above equation 
becomes )° degy vj = >> r degg v;. That is, 2m3 = r(2m;) or m3 =rmy. 

But, by Theorem 17.3, T° has an even number of edges, which implies that either 
r or m, is even. 


Note 17.8 Let G, H, and T be graphs on the same set of vertices, and let [’ be the 
matrix product of G and H. Further, let G be connected and H be non-trivial. Let 
{Vi, V2,..., Vacay} be the degree partition of G. Then H is regular and the following 
hold: 


G) |V;| A 1 for anyi,1 <i < A(G). 
(ii) If |V;| = 2 for some i, 1 <i < A(G), then Z is a 1-factor graph and the 
number of vertices is even. 
Gii) if |V;| = 3, 1 <i < A(G), then A is a union of cycles. 
(iv) If|V;| is odd for some i, 1 < i < A(G), then the regularity of H is even. 


By Theorem 17.5, the path graph P,, has no non-trivial companion. Given a con- 
nected, non-complete graph G, the graph power G* has more edges than G. We know 
that any companion of G is a spanning subgraph of the complement of G. Since Gk 
has fewer edges, it is natural to expect that G* has less chance of having a non-trivial 
companion. We strengthen this by proving that P”” has no non-trivial companion. 
The formal definition of the mth power of a path graph P,, is given below. 


Definition 17.6 The mth power of a path graph P,,, denoted by P.”, is a graph that 
has the same set of vertices as P,,, and in which two vertices are adjacent whenever 
the distance between them in P,, is at most m. 


Since the diameter of P, is (n — 1), foreverym > n — 1, P)” = Ky, the complete 
graph on n vertices. Since, either when G = K,, or when G = P,, there is no graph H 
such that A(G) A (#7) is realizable, we consider P””, 2 < m <n — 1, inthe following 
discussion. 

Consider P,, with vertex set {v;, v2,..., Up}, where v; ~ vj41 forl <i<n-1. 
We note that when n is even andn > 2m, and when n is odd andn > 2m + 1, there is 
no full degree vertex in P’”. If P™” has a full degree vertex, then P” has no non-trivial 
companion. So, when 7 is even and n > 2m, and when n is odd and n > 2m + 1, 
we observe the following. 

In P", uv; and v,41_; are the only vertices with degree m +i —1,1<i<m. All 
the other vertices are of degree 2m. Also note that any m + 1 consecutive vertices in 
P,, of degree at least m form a complete graph K,,4; in P”. To prove that P”, 2 < 


m <n-—1,andn > 1, has no companion, we make use of the following lemma. 


Lemma 17.2 Let G be a graph and H be a subgraph of G such that A(G) A(A) is 
graphical. Then, for any two vertices u and v with u ~y v, the following property 
holds. 

For any k, corresponding to every vertex in Ng(u) having degree k in G, there 
is a unique vertex in Ng(v) with degree k in G and vice versa. In other words, the 
sequence of degrees in G of vertices in Ng(u) is the same as the sequence of degrees 
in G of vertices in Ng(v), when both of them are written in non-increasing order. 
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Proof Let A(G)A(H) = A(P). Since u ~y v, by Note 17.4, degg(u) = degg(v). 
Let x € Ng(u) be of degree k in G. Then, x ~g u ~y v is a G-H path. Since 
A(G)A(A) = A(T), there is a G-H path from v to x through a vertex, say y, ie., 
v~G y~H x. As y ~q X, by Note 17.4, degg(y) = deg, (x) = k. 

Because of the uniqueness of the G-H path from v to x, it follows that y is the 
only such vertex in G with degree k in Ng(v). Hence, for every vertex x in G with 
deg, (x) = k, there is a unique vertex y in G such that deg, (y) = k. The other side 
can be obtained similarly. 


Theorem 17.6 For positive integers n andm, 2 <m <n —1, there exists no non- 
trivial subgraph H of P.", such that A(P)”) ACA) is graphical. 


Proof Let G = P’". Suppose that there exists a non-trivial subgraph H of G, such 
that A(G)A(#) is graphical. Since G is connected, H is regular. Also, v; and v,, are 
the only vertices of degree m in G, and H is non-trivial, which together imply that 
v1 ~H v, and AH is 1-regular. Then v2 ~y Up_-1, V3 ~H Un—2,--+) Um ~H UVn—m415 
since each pair contains only two vertices of that particular degree in G. The remain- 
ing n — 2m vertices in G are of degree 2m. Since the vertices v4; and vyz_» are 
the only vertices of degree 2m which have the same neighboring degree sequence, 
by Lemma 17.2, v,,,; and v,z_ are adjacent in H. Similarly, the vertices v,,,. and 
Un—m—1 are adjacent in H. Continuing in this manner, we reach a stage where the 
vertices x and y of degree 2m in G are adjacent in H, where the distance between x 
and y in G is less than or equal to m. This contradicts the fact that when the distance 
between x and y in G is less than or equal to m, x ~g y and H is a subgraph of 
G. Hence, there is no non-trivial graph H such that A(G)A(H) is graphical when 
n>land2<m<n-1. 


We know that a companion H of G is a subgraph of G. It is therefore natural to ask 
what happens when H = G. The next part of this section deals with this question. 
Note that for any G, either G or G is connected. Hence, if A(G)A(G) is graphical 
then both G and G are regular. Therefore, it is enough to characterize regular graphs 


G for which A(G)A(G) is graphical. 


Theorem 17.7 /f G is a regular graph on the set of vertices {v1, V2, ..., Un}, with 
regularity 1 or n — 2, then the product A(G)A(G) is graphical. In fact, if G is 
1-regular, then 

A(G)A(G) = A(G)A(G) = AG) 


and if G is (n — 2)-regular, then 
A(G)A(G) = A(G)A(G) = A(G). 


Proof We shall prove the result in the case where G is (n — 2)-regular. The case of 
G being 1-regular follows by symmetry as G is then (n — 2)-regular. Note that in 
these cases, n is necessarily greater than three. 

Consider any two vertices v; and v2 in G and two distinct cases based on whether 
or not v; and v2 are adjacent in G. 
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U1 V2 U1 U2 U1 U2 


(a) G (b) G ()T 


Fig. 17.6 G = Cy a 1-regular graph 


Vy 
V2 U6 
V1 U6 V1 U6 
v2 Us . . v2 U5 
3 5 
U3 U4 on U3 U4 
(a) G (b) G (c) P 


Fig. 17.7 G a cocktail party graph on six vertices, a (6-2)-regular graph 


Case I. v; and v2 are adjacent in G: Since G is (n — 2)-regular, there exists a 
unique vertex u such that v2 and wu are adjacent in G, but u is adjacent to v in G. 
Hence, we get a G-G path vj] ~G u ~G v2 between v; and v2. Again by (n — 2)- 
regularity of G, there is no vertex w different from u such that v2 and w are adjacent 
in G. Hence, the uniqueness of G-G path from v to v2. Similarly, we get a unique 
G-G path from v; to v2. 

Case II. v; and v2 are not adjacent in G: Since v; and v2 are not adjacent in G, 
they are adjacent in G and not adjacent to any other vertex in G. Therefore, there is 
no G-G or G-G path from v; to v2. 

From Case I and Case II above, it is clear that for every pair of vertices there 
exists at most one pair of G-G and G-G paths, and from Theorem 17.1, it follows 
that A(G) A(G) is graphical. Again referring to the cases, we observe that a pair of 
vertices being adjacent (respectively not adjacent) in G implies that the vertices are 
adjacent (respectively not adjacent) in the matrix product I’ of G and G. Therefore, 
T = Gand A(G)A(G) = A(G) = A(G)A(G). 


Figures 17.6 and 17.7 illustrate the cases n = 4, 6. 


Remark 17.2 In Theorem 17.7, we observed that A(G)A(G) = A(G) whenever G 
is (n — 2)-regular. In other words, A(G), when G is a 1-factor, acts like a projector 
on the column space as well as the row space of A(G). But it is not true that A(G) 
is a projector. 


The following theorem characterizes the regular graphs G for which A(G) A(G) 
is graphical. 
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U: UV UV 
3 Un 3 UA 3 Un 


(a) G (b) H (c) T 


Fig. 17.8 G=Cs,H =Cs =Cs,T = Ks 


Theorem 17.8 Given a regular graph G, if the non-trivial product A(G)A(G) is 
graphical then exactly one of the following holds: 


(i) G is 1-regular 
(ii) G is (n — 2)-regular (i.e., it is a cocktail party graph on n vertices) 
(iii) G=Cs. 


Proof Consider a regular graph G with regularity k such that A(G)A(G) is graph- 
ical. Clearly, G is also regular with regularity n —k — 1. From Note 17.3, the 
graph T realizing A(G)A(G) is also regular with regularity k(n — k — 1). Therefore, 


k(n —k —1) <n —1 and we get 


ns +k +1) (17.6) 
k-1 
fork Al. 
In the case of k = 1, i.e., G is 1-regular and among the class of graphs described 
in (i), we know from Theorem 17.7 that A(G) A(G) is graphical. 
If k ~ 1, note that 1 < an < 2 and by substituting the upper limit in (17.6) we 
getn <k+3o0rk >n —3. That is, 


k=n-1, n—-2, or n—3. 


If k =n — 1, G is the complete graph and G is the null graph, which is a trivial 
case. 

If k = n — 2, then the graph G is (nm — 2)-regular and among the class of graphs 
described in (11) and hence, by Theorem 17.7, A(G)A(G) is graphical. 

If k =n—3 and k #1, then the regularity of G is 2 and therefore (n — 3)2 < 
n — 1, which implies that n = 5. That is, G = Cs. 

Note that for the graph G = Cs, G = Cs = Cs andin G U G, it is easy to observe 
that, between every two vertices in {v1, U2, V3, V4, Us} there exists a unique G-G path 
and a unique G-G path. Thus, A(G) A(G) is graphical and the matrix product I’ is 
the complete graph Ks (see Fig. 17.8). This concludes the proof. 
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Remark 17.3. Whenever H is a 1-factor graph and I is the matrix product of G and 
H, then it is also true that G is the matrix product of T and H. That is A(T)A(H) = 
A(G). This follows from Theorem 17.7, and the fact that A(#) is a self-inverse 
permutation matrix. 


17.3 Realization of A(G)A(#) (Modulo 2) 


In this section, we discuss the realization of the matrix product under modulo 2 
matrix multiplication. 


Definition 17.7 (Modulo 2 Matrix Product of Graphs) The graph T is said to be the 
modulo 2 matrix product of graphs G and H if A(G)A(H) (mod 2) is graphical 
and I is the realization of A(G) A(H) (mod 2). 


Note 17.9 As in the case of the usual matrix product of graphs, in the case of modulo 
2 product also, we consider only graphs that are not totally disconnected and have 
no isolated vertices. 


Note 17.10 Definition of G-H path between two vertices v; and v; (i # j) remains 
the same as in the case of the general product. 


Definition 17.8 (Parity Regular Graph) A graph is said to be parity regular if the 
degrees of its vertices are either all even or all odd. 


Note 17.11 In the case of the usual product of the adjacency matrices of graphs G 
and H, anecessary and sufficient condition for realizability of A(G)A(#) is that H 
must be a subgraph of G and for each ordered pair of distinct vertices, the number of 
G-H paths and the number of H-G paths are the same and are equal to | or 0. When 
the matrix product is taken modulo 2, this condition is no longer necessary. In fact, 
the class of graphs H for which A(G)A(H) (mod 2) is graphical, is found to be a 
larger class. In Example 17.3, we have a graph H such that A(G)A(H) (mod 2) is 
graphical but H is not a subgraph of G. 


Example 1 


In the following graphs G and H (as shown in Fig. 17.9) on seven vertices, H is not 
a subgraph of G. Note that the adjacency matrices of G and H are 


0010010 0011110 
0001001 0001111 
1000100 1000111 

A(G)=|0100010| and A(H4)=]1100011). Then 
0010001 1110001 
1001000 1111000 


0100100 0111100 
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A(G)A(H) (mod 2) = 


eo) 
Pee RrPrROFR 
Beer ORR 
ft a a 
BPRrRORFRH 
FPOrRRFRFRH 
ol 


Clearly, A(G)A(#) is graphical and the graph I" realizing A(G)A(#) is K7. 


V1 V1 
v2 U7 V2 UT 
U3 U6 U3 U6 
U4 U5 V4 U5 
(a) G (b) H 
U1 
v2 U7 
U3 U6 

U4 U5 

(c) [= K, 


Fig. 17.9 Graphs G and H for which K7 is the realization of A(G)A(H) 


In the following results, we shall consider the graphs G and H on the same set of 
vertices {v1, V2,..., Un}. 


Lemma 17.3 The diagonal entries of the matrix product A(G) A(A) are zero if and 
only if, for each vertex v;, the number of G-H paths from v; to itself is even. 


Proof Let A = A(G), B = A(HA), and C = B. Then 


ci =) ajxbyi (mod 2). (17.7) 
k 
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As ajxbxj is 1 if u; ~G vg ~H vj, and is 0 otherwise, 3. aixbxj is the number of 
G-H paths from v; to itself. The result follows. 


Lemma 17.4 The (i, j)-entry (i 4 j) of the matrix product A(G)A(H) (mod 2) is 


either 0 or 1, depending on whether the number of G-H paths from v; to v; is even 
or odd, respectively. 


Proof Let A(G) = [ajj], ACH) = [bij], and A(G)A(H) = [ci]. For i A j, note 
that a;,bj, = 1 if and only if there exists a G-H path v; ~G vg ~H 0; from y; to v;. 
Now from (17.7), the result follows. 


Lemma 17.5 The matrix product A(G)A(H) is symmetric if and only if for each 
pair of distinct vertices v; and v;, the numbers of G-H paths and H-G paths from 
v; to vj have the same parity (i.e., either both are even or both are odd). 


Proof Note that each of A(G) and A(#) is symmetric, and, therefore, their product 
A(G)A(A) is symmetric if and only if A(G)A(H) = A(H)A(G). Now the result 
follows from Lemma 17.4. 


The characterization theorem in the case of modulo 2 product takes the following 
form. The proof is omitted as it is similar to that of Theorem 17.1. 


Theorem 17.9 The product A(G)A(H) is graphical if and only if the following 
Statements are true. 


(i) Foreveryi (1 <i <n), there is an evennumber of vertices v; such that v; ~G vx 
and Uk ~H Uj- 

(ii) For each pair of vertices v; and v; (i # j), the numbers of G-H paths and H-G 
paths from v; to v; have the same parity. 


Note 17.12 The (i, j)-entry (i # j) of the matrix product A(G)A(#) is either 0 
or | depending on whether the number of G-H paths from v; to v; is even or odd, 
respectively. 


Recall that, under the usual multiplication, A(G)A(G) is graphical only when 
G is regular with regularity | or (n — 2) regular or is Cs, the cycle on 5 vertices 
(Theorem 17.8). Example 17.3, given below, shows that even when G is none of the 
above three types of graphs, A(G) A(G) can be graphical when the multiplication is 
taken modulo 2. 


Example 2 


Consider a 2-regular graph G on six vertices and its complement G, as shown in 
Fig. 17.10. Note that 

O11011 

101101 

110110 

OL1O114’ 

101101 

110110 


A(G)A(G) = 
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Vi U6 V1 V6 


v2 U5 v2 U5 


V3 U4 U3 V4 
(a) G (b) G 


U1 U6 


v2 U5 


U3 V4 


(c) T 


Fig. 17.10 A 2-regular graph G on 6 vertices, its complement G, and their modulo 2 matrix product 
r 


and the cocktail party graph shown in Fig. 17.10 is the graph realizing A(G) A(G). 


In the case of the modulo 2 matrix product, the following result characterizes the 
graphs G for which A(G)A(G) is graphical. 


Theorem 17.10 For any graph G on the set of vertices {v, v2, ..., Un}, the following 
are equivalent: 


(i) The matrix product A(G)A(G) is graphical. 
(ii) For everyi and j,1 <i, j <n, degg v; = degg v; (mod 2). 
(iii) The graph G is parity regular. 


Proof Note that (ii) <> (iii) follows from the definition of parity regular graphs. 
Now we shall prove (i) <> (ii). Let A = A(G). From the definitions of the com- 
plement of a graph and the G-H path with H = G, 


deg v; = Number ofG — G paths fromy; tov; + 
Number of G — G paths fromu; tov; + aj;, (17.8) 


and similarly, 
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deg, v; = Number ofG — G paths from v; tov; + 
Number ofG — G paths fromv,; tov; + ajj (17.9) 


for every pair of distinct vertices v; and v;. Note that a G-G path from v j to vu; is 
a G-G path from v; to v j- Then, from Theorem 17.9 and by comparing the right- 
hand sides of (17.8) and (17.9), we see that A(G)A(G) is graphical if and only if 
deg, v; = deg, v; (mod 2). 


Remark 17.4 It is also possible for one to prove (i) => _(ii), by taking A(G) = 
J — A(G) —T in the matrix products A(G)A(G) and A(G)A(G), where J is the 
n xX nall-ones matrix and J is the n x n identity matrix. 


Example 3 


For any two graphs G and H such that A(G)A(/Z) is graphical under usual matrix 
multiplication, from Corollary 17.1, if any of G and H is connected, then the other 
is regular. When we consider matrix multiplication under modulo 2, we observe that 
the connected graphs G and G as shown in Fig. 17.11 do not satisfy this property, as 
both the graphs are merely parity regular but neither of them is regular. Note that 


001010 
000101 
100010 
010001 
101000 
010100 


A(G)A(G) = 


realizes the graph I’ as shown in Fig. 17.11. 


When the matrix multiplication is considered under modulo 2, the following 
theorem characterizes graphs G with A(G)A(G) = A(G). 


Theorem 17.11 Consider a graph G defined on the set of vertices {v,, V2, ..., Un}. 
Then A(G)A(G) = A(G) ifand only if (A(G))* is either a null matrix or the all-ones 
matrix J. 


Proof Let A = A(G). From Lemma 17.5, taking H = G, we see that A(G) A(G) = 
A(G) implies that the number of G-G paths from v; to v;, modulo 2, is a;;. Now 
substituting in (17.8), we get, for each j #71, 


deg, v; = (number ofG — G paths from v; tov;) (mod 2). (17.10) 


From Theorem 17.10, we know that G is parity regular and therefore deg, vj = 
deg, v; (mod 2). Thus, from (17.10), (A(G))? is either 0 or J. 
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V1 U6 U1 V6 


v2 U5 v2 @ U5 


V3 U4 U3 V4 
(a) G (b) G 


U1 U6 


v2 U5 


U3 V4 


(c) TP 


Fig. 17.11 Modulo 2 matrix product of a non-regular graph and its complement 


Conversely, suppose that (A (G))? is either 0 or J. If (A(G))* = 0, then the degrees 
of all the vertices in G are even and if (A(G))? = J, then the degrees of all the vertices 
are odd. By taking A(G) = J — A(G) — 1 (= J + A(G) + J, since subtraction and 
addition are equivalent modulo 2), we get 


A(G)A(G) = A(G)J + A(G) +I) 
= A(G)J + (A(G))? + A(G). 


In either of the cases (A(G))* = Oand (A(G))* = J, the right-hand side of the above 
reduces to A(G) modulo 2. 


The corollary given below follows from Theorem 17.11, which also characterizes 
the graphs G with the property A(G)A(G) = A(G) in terms of properties of G. 


Corollary 17.4 Consider a graph G such that A(G) = A and A(G) = B. Then the 
following statements are equivalent: 


(i) AB=A. 
(ii) BP =lorJ -1. 
(iii) G is a graph with one of the following properties: 


(a) Each vertex has an odd degree and the number of paths of length 2 between 
each pair of vertices is even. 

(b) Each vertex has an even degree and the number of paths of length 2 between 
each pair of vertices is odd. 


356 G. Sudhakara et al. 


Proof (i) = > (ii) From Theorem 17.11, AB = A implies that A? is either a null 
matrix or J. By taking A = J — B — I, we have A* = J* + B* + J. Further, note 
that J? is either a null matrix or J itself, depending on whether the number of ver- 
tices on which the graph defined is even or odd. In either case, A7 = 0 or J implies 
Be=lorJ—I. 


Gi) = > (i) Noting that J > =Oor J itself, by suitable substitution for B*, we have 
A? = J? + B* +1 =Oor J. Hence, we obtain (i) from Theorem 17.11. 


As (iii) is the graph theoretical interpretation of (ii), (ii) <= > (iii) is trivial. 


The following corollary characterizes the graphs G for which A(G)A (G) = A(G) 
and the proof is immediate from Corollary 17.4. 


Corollary 17.5 Consider a graph G such that A(G) = A and A(G) = B. Then the 
following statements are equivalent: 


(i) AB=B. 
(ii) A7=IlorJ—1. 


Note that the adjacency matrix A(G) of a graph G is not idempotent under the usual 
multiplication, unless G is totally disconnected. The following theorem verifies when 
(A(G))* = A(G) when the product is taken under modulo 2. 


Theorem 17.12 Let G be a graph with adjacency matrix A(G). Then the following 
statements are equivalent: 


(i) (A(G)) = A(G), i.e., A(G) is idempotent. 
(ii) A(G)A(G) = 0 and the degree of every vertex in G is even. 
(iii) The number of G-G paths of length 2 between every pair of vertices is even, 
and the degree of every vertex in G is even. 


Proof (i) => ii) (A(G))* is graphical implies that the diagonal entries of (A(G))* 
are zeros and in turn, we see that degree of each vertex in G is zero modulo 
2. In other words, degree of each vertex is even. So, we have A(G)J = 0 and 


U3 U6 


U4 Us, 
(a) G 


Fig. 17.12 A graph G for which A(G)A(G) = A(G) 
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therefore A(G)A(G) = A(G)(J — A(G) — 1) = (A(G))" — A(G) = O whenever 
(A(G))" = A(G). (ii) = > (iii) follows from Lemma 17.5 and (iii) => (i) follows 
from Theorem 17.10. 


Example 4 


The graph G as shown in Fig. 17.12 is such that (A(G))* = A(G) and the degree of 
every vertex of G is even. Further, A(G)A(G) = 0. 


17.4 Some Matrix Equations of Graphs 


In the last section, we have seen the graph realizing the product of adjacency matrices 
of two graphs and the related results. In this section, we consider the product of other 
matrices associated with graphs and the equation involving this product. As a bi- 
product, we get the characterization of complete bipartite graphs and cycle graphs. 
In all the matrix equations we consider, the adjacency matrix A(G) is a part on the 
left-hand side. An isolated vertex in a graph results in a zero row in its adjacency 
matrix, so the product on the left-hand side also has a zero row, which implies that 
the graph realizing the product has an isolated vertex. To avoid this situation, we 
consider graphs without isolated vertices. 

For the sake of convenience, we say that an edge e is incident with a vertex v ina 
graph G if the vertex v is either incident with the edge e or else it is adjacent to any 
end vertex of e. 

We mainly deal with the following matrix equations of graphs. 


(i) Characterize the graphs G for which A(G)B(G) = B(#), for some graph H. 
(ii) Characterize the graphs G and the size of matrix J for which A(G)B(G) = J, 
where J is a rectangular matrix with all entries equal one. 
(iii) Characterize the graphs G for which A(G) B(G) is graphical, i.e., A(G) B(G) = 
A(I) for some graph I. 


Since the adjacency matrix of a cycle graph is circulant, we note the following 
properties of circulant matrices which are useful for later purpose. 


Definition 17.9 Ann x n matrix S is said to be a circulant matrix if 
Sij = Siz, where 1<k<nandj—i+1l=k (modn). (17.11) 


Further, a circulant matrix is represented by its first row. 


Remark 17.5 Inthe following, we remark on certain properties of circulant matrices 
which follow from the definition of circulant matrix. 


(i) Sum and product of circulant matrices are circulant. 
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Gi) A (0, 1)-circulant matrix represented by (a, d2,...,d,) is graphical (is an 
adjacency matrix of some graph) if and only if 


a,=0 and a4;=q, forallkK+izHn+2 (modn). (17.12) 


(iii) Forthe cycle graph G = v1 ~ v4 ~ U5 ~ V2 ~ V3 ~ Vj}, A(G) is not circulant. 

(iv) From Eq. (17.12) in (ii), it is clear that a circulant matrix (a), d2,..., Gn) 
is A(G) for some 2-regular graph, then a; = a, = | for exactly one pair of 
i,k > 2 such thati + k =n-+2 (mod n) and a; = 0 for all other j. 

(v) For 2 <i <k, let Ci,4) represent a circulant matrix (a), a2,...,4,) with 
a; = a, = 1 anda; = 0 for all other j. Then Ciz,,) is the adjacency matrix of 
the cycle graph v) ~ v2 ~ +++ ~ Un ~ UY. 

(vi) For2 <i <k, Ci) represents an adjacency matrix of a cycle graph G onn 
vertices if and only ifi +k =n-+2 mod nv and, the integers i — 1 andn are 
relatively prime. Note that i — 1 and n are relatively prime <=> the walk 


UL © V1i4G-1) © U1420G-1) Yt © ViepG-1) WY ** has all the vertices of the 
graph <=> the graph is connected and hence the two regular graph G is a 
cycle. 


(vii) For2 <7 <n, C(,i41) is an adjacency matrix ofa graph G (so,2i + l1=n+2 
mod n) if and only if G is acycle on an odd number of vertices andi = d + 1, 
where d is the diameter of the graph. 


(viii) For 1 <i <n, let C@—1,i,i41,142) denote a circulant matrix (a), a2, ..., Gn) 
such that a;_; = aj = 4j41 = 4j42 =1 and a; = 0 for all other j. Note that 
forn = 4, the circulant matrix C(j—1,i,141,:+2) is trivially symmetric. Forn > 5, 
we have (@ — 1) + (+2) =i+(@+ 1) =n+2 for any n if and only if n 
is odd and i = att Then (a), a2, ..., @,) is graphical if and only if n is odd 
andi = at 


Let G bea graph with V(G) = {v1, v2,..., vn} and E(G) = {e1, e2,..., em}. Let 
A(G) and B(G) be the adjacency matrix and incidence matrix of G, respectively. 
Consider the equation A(G) B(G) = C. We try to understand the nature of C. 


Theorem 17.13 [f A(G)B(G) = C, then the following assertions hold. 


(i) The entry c;, is non-zero if and only if the vertex v; is either incident or adjacent 
to the edge €y. 
(ii) The number of non-zero entries in the pth column of C represents the number 
of vertices adjacent to ép. 
(iii) Matrix C always dominates B(G) entrywise, i.e., (C);; = (B(G))j; for alli, j. 
(iv) C isa (0, 1)-matrix if and only if G is triangle free. 


Proof Let (A(G))ij = aij, (B(G))e: = ber, and (C) gi = cx for all possible i, j, k, 1. 
For any p (1 < p < m), let the edge e, be incident with the vertices v, and v, where 
k <1, in which case by, = Diy = 1: 
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Cip = > ajb jp 
j 


= AjDep + Gibip (17.13) 
iia ia: (17.14) 


From (17.14), it is clear that c;, represents the number of end vertices of e, with 
which the vertex v; is adjacent, i.e., the number of 1s among a;, and aj. So, (i) of 
the theorem holds. For the same reason, part (ii) holds. 

For b;, = 1, we have i = k orl. Ifi = k, we have aj, = aj; = 0 and aj; = 1. So, 
from (17.14) we get cj, = 1. Hence part (iii) of the theorem follows. 

Again referring to (17.14), we see that c;, = 2 if and only if both a;, and aj; are 
equal to one, i.e., vj, vz, and v; induce a triangle with e, = (v,, vj) as an edge in it. 
This proves part (iv) of the theorem. 


Note 17.13 The number of 2s in the matrix C equals three times the number of 
triangles in the graph G. 


We notice that, when G has n vertices and m edges, A(G) B(G) is a matrix of order 
n x m. Now we consider the question, can A(G) B(G) be B(#) for some graph H 
on n vertices having m edges? The next theorem answers this question. 


Theorem 17.14 A(G)B(G) = B(A) for some graph H if and only if B(G) = 
B(A) and G = H is a 1-factor graph. 


Proof From (17.14) in Theorem 17.13, it is clear that cj, in the ith row of C, the 
product A(G)B(G), is the number of end vertices of e, with which the vertex v; is 
adjacent, i.e., the number of 1s among a;;, and a;;. Now referring to (iii) of Theorem 
17.13, we see that C dominates B(G) entrywise. Since an incidence matrix has 
exactly two Is in each of its columns, C is an incidence matrix of some graph H if 
and only if C = B(G). 

Again referring to (ii) of Theorem 17.13, C = B(G) implies that vertices adjacent 
to any edge are only those vertices which are incident with it. So, no two edges in the 
graph are adjacent to each other. Since no vertex in the graph is an isolated vertex, 
degree of each vertex is exactly one and therefore the graph is a 1-factor graph. 


It is interesting to search for a graph G, if it exists, satisfying A(G)B(G) = J where 
J isan x m matrix with all entries equal to 1. We have the following theorem. 


Theorem 17.15 Given a graph G with n vertices and m edges, consider the matrix 
equation A(G)B(G) = C. Then the following statements are equivalent. 


(i) C = Jnxm, where m = k(n —k) for some 1 <k <n and Jyym is the matrix 
with all entries equal to one. 
(ii) For every pair of vertex v; and an edge e; of the graph, the vertex v; is adjacent 
to exactly one end vertex of é;. 
(iii) G is a complete bipartite graph with diameter 2. 
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Proof (i) = > (ii). From the definition of J, the entries of every jth column are 
one, and from (ii) of Theorem 17.13, we know that every vertex v; is adjacent to the 
edge e;. Again referring to (iv) of Theorem 17.13, we see that the graph is triangle 
free and therefore v; cannot be adjacent to both the end vertices of an edge e;. So, 
(ii) holds. 

(ii) => (iii). If v; and v; are adjacent, then the vertices are at distance one. 
Otherwise, let v; and v; (i # j) be any non-adjacent vertices in the graph. Since v; 
is not an isolated vertex (from the choice of G), there exists an edge v; ~ v, in the 
graph. Now from (ii) of the theorem, we see that v; is adjacent to vg; and therefore v; 
and v; are at distance two from each other. Taking a vertex v, of the graph, partition 
the vertex set into V,; and V2 such that V; consists of all the vertices at distance one 
from v, and V> consists of v; and all the other vertices at distance two from v;. From 
(ii), we see that G is triangle free and therefore no two vertices from V, are adjacent. 
Since the vertices from V> are not adjacent to vj, from (ii) we know that they are 
adjacent to every vertex of V;. Again the triangle free property of the graph proves the 
non-adjacency between any two distinct vertices of V2. So, G is a complete bipartite 
graph. 

(iii) = > (i). Let G be acomplete bipartite graph with partition V; and V2 having 
k and n — k number of vertices, respectively. So, the graph has k(n — k) number of 
edges, and the matrices B(G) and C in the matrix product are of size n x m, where 
m = k(n — k). Since a complete bipartite graph is triangle free and every vertex is 
adjacent to every edge of the graph, we have C = J with all the entries equal to 
one. 


Now we consider a matrix equation A(G)B(G) = C, where C is an adjacency 
matrix of some graph I’. Note that G should be with n edges, so that the product is a 
square matrix. From the structures of A(G) and B(G), itis clear that A(G) B(G) = C 
is graphical if and only if ACH) B(A#) is graphical, for every connected component 
H of G. So, we assume that G in the following discussions is connected, unless 
indicated otherwise. 

If a connected graph G is with n vertices and n edges, then the graph is either a 
cycle graph or else it is a unicyclic graph. In the following theorem, we characterize 
a cycle graph on n vertices in terms of a matrix equation. 


Theorem 17.16 Given a graph G on n (= 4) vertices, the following statements are 
equivalent. 


(i) Gis acycle on four or more number of vertices. 
(ii) A(G)B(G) is ann x n matrix in which each column is with exactly four ones. 
(iii) A(G)B(G) is ann x n matrix in which each row is with exactly four ones. 


Proof (i) => (ii). Let G be a cycle on four or more vertices. Let v;, ~ v;, ~ vi; ~ 
vj, and @, = vj, ~ v;, in which case from (17.13) we have 
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Ci, p = i,j, Dj, p + GiinDinp = 1, 

Cixp = Gini Pi,p + GinizDizp = = 1, (17.15) 
Cisp = GisinDinp + Gisisdign == 1, , 
Cigp = Gigis Disp sr Gig jy jrp =1, 


where v,;, and v;, are the vertices, different from v;, and v;,, adjacent to v;, and v,,, 
respectively. Further, for every other j ¢ i1, i2, 13, ig we note that v; is not adjacent 
to e, and hence c;, = 0. Therefore each column has exactly four Is. 

Gi) = > (i). Let A(G)B(G) be ann x n matrix in which each column is with 
exactly four 1s. From (iv) of Theorem 17.13, G is a triangle free graph and therefore 
the graph has four or more vertices. Suppose that G is not a cycle, then there exists at 
least one vertex not on the cycle subgraph of the unicyclic graph. Since the graph is 
with n vertices, n edges and triangle free, there exist distinct vertices w;, w2, W3, Wa, 
and ws in the graph G such that w; ~ w2 ~ w3 ~ w4is onthe cycle and w3 ~ ws is 
not on the cycle. Now, the edge w2 ~ w3 has at least five distinct vertices adjacent to 
it. So the corresponding column has five Is, a contradiction. Therefore G is a cycle. 

(i) = > (iii). Using the fact that every vertex is adjacent to exactly four edges in 
a cycle, the proof is in the similar lines of the proof of (i) => (ii). 

(iii) => (i). If the graph is unicyclic, then take the vertices w;, w2, w3, wa, and 
Ws as in the case of proof of (ii) == (i) and observe that the row corresponding to 
w3 has five or more 1’s. 


We adopt the lines in the proof of Theorem 17.16, suitably, to prove the following 
theorem. 


Theorem 17.17 Given a graph G on n > 4 vertices, the following statements are 
equivalent. 


(i) G has a cycle subgraph on the vertices vj,, Vj,,..., Ui, and with the edges 
Cpr prs ++ +> Cp, (k = 4). 
(ii) In the k x k sub-matrix of A(G)B(G), with rows determined by i, i2,..., ix 
and columns determined by p, p2, ..., Px, each column has exactly four 1s. 
(iii) In the k x k sub-matrix of A(G)B(G), with rows determined by i, i2,..., ix 
and columns determined by p,, p2,..-, Px, each row has exactly four 1s. 


Note that if G is a cycle graph on three vertices, then A(G)B(G) has three 2s, 
and if G is a cycle graph on four vertices then A(G)B(G) = J4. In both cases, the 
product is not the adjacency matrix of any graph. 


Lemma 17.6 Let vj-2 ~ Uj-1 ~ U; ~ Ui41 © Vi42 ~ Vi43 be a walk in a graph G 
and the vertices are distinct except possibly for the pair vj-2 and vj+3. Let C = 
A(G)B(G). Then 


Cp = Yj ~ Vig, ifand only if Cj_-1p = Cip = Cit 1p = Ci42p = 1. (17.16) 
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Proof Ife, = v; ~ v;+1, from Eq. (17.15) we have 


Ci-1,p = Gi-1,i-2Di-2, p + Ai-1iDip =I, 
Cip = Gi-1Di-1,p + Gi,i41 Di 41, p =, (17.17) 
Cit1,p = G4iDip + 4i41,142Di42,p =I, 


Ci42,p = 442,410: 41,p + Gi42,1430i43,p = 


Conversely, if the equations in (26.7) hold, it is clear that the edge e, in the cycle 
graph G is incident with two distinct vertices v;-2, Vj-1, Vj, Ui41, Vit2, Vi-3- Since 
the vertices on the given walk are distinct except possibly for the pair v;_2 and v;+3, 
then e, = vj-2 ~ v;-1 implies bj, = bj42) = 0 = cj41p. In fact, fore, A vj ~ vj+1, 
at least one equation in (26.7) does not hold. So, ey = vj ~ vj+41. 


The following corollary follows immediately. 


Corollary 17.6 Let G be a graph free from any triangle and cycle of length four. If 
Vi, © Vip © Viz ~ Vj, is a path, then ey = v;, ~ Vj, ifand only if Ci, p = Ciyp = Cizp = 
Cigp = 1. 


Theorem 17.18 Let G be a cycle graph onn > 5 vertices with A(G) = Ci2,n) and 
d = [5], the integer part of 5. Then the following statements are equivalent. 


(i) A(G)B(G) = C is graphical. 

(ii) C is the circulant matrix C(e—1,k,k-+1,k42), Where k =d+ 1 andn=2d +1. 
In other words, C = A(T), where 1 is a 4-regular graph in which v; ~r 
Vd+i-1s Vd+i> Vdt+i+1, Vd+i+2- 

(iii) Gis a graph on odd number of vertices and edge e; is incident with the vertices 
Va+i and vga+i+41 for alli. 

(iv) B(G) = A(A) for some 2-regular graph H in which v; ~y Va+i, Va+i+1 for 
alli. In other words, ACH) = C(a+1,d+2)- 


Proof (i) => (ii). Let A(G)B(G) = C and C be graphical. From Theorem 17.16, 
G is acycle graph on five or more vertices implying that the matrix C = A(G)B(G) 
has exactly four 1s. Since A(G) is C(2,n), from Eq. (26.8) in Lemma 17.6 we have 


Cp = Ui ™ Vi+i1 if and only if Ci-lp = Cip = Ci+1p = Ci42p = 1. (17.18) 


Therefore, the ith row has leading non-zero entry only in the pth column and no two 
sequences of four 1s in different columns of C are in the same alignment. Now from 
the symmetry of C, a non-zero entry in any row is a part of a sequence of four Is 
in that row and the leading 1| of a column is the terminating 1 of the corresponding 
row. So, in the (p + 1)th column we have 


Cipt = Citipti = Cit2pt1 = Ci43pt1 = Land p41 = i411 ~ Vi42. (17.19) 


So, the matrix C = A(G)B(G) is a circulant matrix of the form (dq), d2,..., Gn) 
with ay_) = ay = Ag41 = Ag42 = Landa; = O forall other j, whenever e; = vy, ~G 
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vz+1. In other words, C is of the form C(x—1,4,4+1,42) aS defined in (viii) of Remark 
17.5. Therefore, C = A(I°) for some graph I (graphical) implies that n is odd, 
k =d+1,andT is as described in (ii). 

(ii) = > (i) follows trivially. 

(ii) = > (iii). From (ii), it is clear that G is a cycle graph on odd number of 
vertices and A(G) is Cia.a+1,d+2,a+3). Now referring to (26.8) in Lemma 17.6, we 
see that the edge e; is incident with the vertices vy,; and vg+;+, for alli in the graph 
G. 

(iii) => (iv). From (iii), n is odd and therefore C(g+1,a+2) is graphical (refer to 
(vii) of Remark 17.5). Again, from the description of B(G) as in (iii) it is clear that 
B(G) = Ca+1,d4+2) and equals A(#), for the graph H as described in (iv). 

(iv) => (ii). Let C = A(G)B(G). From Eq. (26.8) of Lemma 17.6, A(G) = 
Cen) and B(G) = Ci+1,a+2) implies that C’ is Cia,a+1,d+2,a+3) (C’ denotes the 
transpose of C). From (iv), we see that B(G) = C(a+1,¢+2) 18 graphical and therefore 
n is odd. Now referring to (viii) of Remark 17.5, we see that C’ = Ciq.a41,a42,d+3) 
is graphical, thus proving (ii). 


Remark 17.6 If G is a cycle graph on an odd number of vertices such that A(G) is 
C2,n) then, in Theorem 17.18, we have proved that the edges of G have a particular 
labeling such that A(G) B(G) is graphical. In this case, B(G) = A(#) fora particular 
cycle subgraph H of G. In fact, every vertex is adjacent to two vertices in H which are 
at distance d (diagonally opposite vertices) in G. Similarly, the vertices in the graph 
I’ such that A(G) B(G) = A(T) are adjacent to the four vertices that are at distance 
d —1ordinG. Anexample of G, H, and IT which satisfy A(G) B(G) = A(T) and 
B(G) = A(#Z) are given below. 


Note that 
Vi U2 V3 V4 V5 VE VI 
vy f/O9 1 0 0 0 0 1 
v2 101 0 0 0 0 
370 1 0 1 0 0 0 
AG)=u4]0 0 10 1 0 0 
vy | 0 0 0 1 0 1 +0 
vw | 0 0 0 0 1 0 1 
v \1 00001 0 
ey e2 €3 €4 @€5 €6 C7 
vy /0 0 0 1 1 0 0 
wfO0o0 0 0 1 1 +0 
31/0 00 0 0 1 1 
BG)=A(HA)=u]1 00 000 1 
vyf 1 100 00 0 
v | O 1 1 0 0 0 0 
v3 \O O 1 1 0 0 0 
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U1 V1 

€5 €4 
v2 U7 v2 U7 
€6 €3 
U3 V6 U3 U6 

€7 €2 
V4 ee, »=U5 V4 U5 
(a) G (b) H 
U1 
v2 U7 
U3 U6 
V4 U5 
(c) T 


Fig. 17.13 Graphs G, H, andT satisfying A(G) B(G) = A(T) and B(G) = A(A) 


e 
© 
iS) 
ix 
w 
iw 
£ 
cS 
Nn 
i= 
nN 
= 
a 


vi 
v2 
V3 
andA(I) = v4 
U5 
U6 
U7 


OrrFrFF OC 
Beer OOO 
Ber OOO 
BROOOR. 
ep ooeorr 
ooo rR Fr FE eK 
corr rFE FE Oo 


satisfying A(G)B(G) = A(T). 


In the following theorem, we provide a characterization of cycle graphs and cycle 
graphs on an odd number of vertices. The proof is immediate from the fact that given 
any cycle graph G with an adjacency matrix A(G), there exists a permutation matrix, 
viz., P such that PA(G) P’ = Cg,,), and if n is odd then there exists a permutation 
matrix Q such that PB(G)QO = Cia+1,a42) (Fig. 17.13). 


Theorem 17.19 Let G be a connected graph on n > 4 vertices. For the adjacency 
matrix A(G) and incidence matrix B(G) with reference to the given labeling of 
vertices and edges, the following statements hold. 
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(i) G is a cycle if and only there exist permutation matrices P and Q such that 
PA(G)B(G)Q = Cid.a+1,d42,d4+3), where d = Lil. In this case, PA(G)P’ = 
Cea,n) and PB(G)Q = Cia+i,d42): 

(ii) G is a cycle on an odd number of vertices if and only if PA(G)B(G)Q = 
C(d,d+1,d+2,d+3) in (i) is symmetric (graphical). 


The following theorem characterizes each component of the matrix equation 
A(G)B(G) = A(L). 


Theorem 17.20 Let G be aconnected graphonn vertices. Then A(G)B(G) = A(L) 
if and only if the following holds. 


(i) Gis acycle on an odd number of vertices. 
(ii) There exists permutation matrix P such that PA(G)P' = Can), PB(G)P' = 
C(d+1,d+2) and PA(L)P’ = C(d,d-+1,d+2,d+3)s where d = ae 


Proof Let A(G)B(G) = A(L), in which case G is triangle free and n > 4. First, 
we prove that the graph G is a cycle. Suppose that G is a unicyclic, then G has a 
unique proper cycle subgraph and n > 4. We claim that the vertices and edges on 
this cycle are labeled by the same set of indices. Otherwise, from Theorem 17.17 and 
the symmetry of A(G) B(G) we will have two cycles in G, which is not possible in a 
connected unicyclic graph. Without loss of generality, assume that both the vertices 
and edges of the cycle subgraph are indexed by 1, 2,...,k for some 4<k <n, 
1.€., Uj,..., Ug are the vertices on the cycle subgraph and e), ..., ex, are the edges. 
Now A(G)B(G) is of the form ( ©! ©2 

C3 C4 
vertices and edges of the cycle subgraph. Now for any edge e; not on the cycle (i.e., 
k +1 <i <n) but incident with some vertex of the cycle, e; is adjacent to three 
vertices on the cycle. So, the corresponding column in C2 has three 1s. But each 
vertex v; not on the cycle (i.e., k + 1 < j <n) is adjacent to at most two edges 
of the cycle. Therefore, each row of C3 has at most two Is. This contradicts the 
symmetry of the product A(G) B(G). So, G must be a cycle. Further, note that given 
any cycle G and its adjacency matrix A(G), there exists a permutation matrix P such 
that PA(G) P’ = Cy,y). Also note that PA(G)B(G)P’ = PA(G)P'P B(G)P’ and 
PA()P’ is also graphical. Now we refer to Theorem 17.18 for the rest of the 
proof. 


) where the block C; corresponds to the 


Let C, be a cycle on n vertices v;, v2,..., Up, Where n is even. We can observe 
that the 1-factor graph H with vertices v;, v2,..., V, and v; ~y Vg+;, where d is the 
diameter of the cycle C,,, is a companion of C,, with v; ~g vj41 for | <i <n—1, 
and vy ~G v,. From Theorem 17.18 for a cycle graph C,, > 5 and n odd, there is 
a companion. 

For the graph K,,, every graph on m vertices is a companion. The generalized 
wheel W,,,,, which is the join of the above two graphs, has many more edges than 
the cycle graph. We show that when m and n are even and n > 4, the generalized 
wheel has a companion. 
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Let vj, v2,..., U,» be the peripheral vertices and vj, v5,...,v/, be the central 
vertices of the generalized wheel G = W,,,,. The peripheral vertices vj, v2, ..., Un 
are such that v; ~g vj41, 1 <i <n—1 and v, ~G vy. So, the adjacency matrix of 
G can be oe as follows: 

nxn Jixm 
ae) ~ Jinxn Omxm 
Jnxn are all-ones matrices of the respective orders, and 0, is a zero matrix of 
order m. 

The following theorem determines positive integers m and n for which the graph 
Wrm.n has a companion. 


, where Cy, 1s the adjacency matrix of C,, Jnxm and 


Theorem 17.21 ais G = Wn.n be the generalized wheel graph with the central ver- 
tices UV}, U;..., U,, and peripheral vertices v1, v2, ..., Un. Then G has a companion 
H if and only if m and n are even and n > 4. The graph H is a \-factor graph with 
Uj ~H Vayi, 1 <i < 5, andd = ¥ is the diameter of the cycle Cy. 
Proof Assume that the graph | G = Win has acompanion H. Since G is connected, 
H is a regular subgraph of G by Corollary 17.1. Note that G = K,, UC, and the 
number of components of G is one more than that of C,,. Thus, H is a subgraph of 
Ki U Cy. The adjacency matrix of H can be viewed as 

A(H) = (ie Onxcm ), where By xn is the adjacency matrix of the subgraph of 

Omxn Dinxm 
H induced by the vertices v1), v2,..., Vy, and Dy, xm is the adjacency matrix of the 
subgraph of H induced by the vertices v}, v3, ..., v/,, and Onxm and Omxn are zero 
matrices of orders n x m and m x n, respectively. 
The product of A(G) and A(#/) is given by 
CrxnBaxn JnxmDmxm 

BGA) ~ JnxnBnxn Omxm 

From the (1, 2) and (2, 1) blocks of A(G) A(#1), we can note that no row (column) 
of either B, x» Or Diyxm May have two or more non-zero entries, because otherwise 
there would be an entry greater than or equal to 2 in A(G)A(#), a contradiction to 
the fact that A(G) A(#) is graphical. 

Hence, by regularity of H, it follows that B,,., is a permutation matrix and 
JInxnBaxn = Inxn. By a similar argument, D,,.. is a permutation matrix. Since 
Baxn and Dn xm are permutation matrices and are adjacency matrices of graphs, the 
graphs realizing them are 1-factor graphs, which implies that m and n are even. 

Now we proceed to show that v; is adjacent to vx4;, 1 <i < 5, in the graph H. 

Let CrxnBaxn = I’ Since Byyn iS a permutation mite YT, <1 forl< 
i,j <n. 

Let B;,; = 1 for some i and j. Then T)_ j=l and I 

By the symmetry of I’, 


nxn* 


=, 


i+1,j 


re i= = (CaxnBaxn)j,i-1 = 1, 


=> Cj, j-1Bj-1i-1 + Cj, js Bysii-1 = 1. 
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The above implies that either Bj_,,;-; = 0 and Bj41,;-1 = 1, or Bj-1,;-1 = 1 and 
Bjari-1 = 90. 

Suppose that Bj_;,;-; = 0 and Bj+1,;-) = 1. 

Because of symmetry, B;_ A= = 1, which implies that T;_, = =T; jth. 

By the symmetry of I’, 7 Mieii-e = Cj41,j Bj i-2 + Cj4i,j42Bj42,i-2 = 1. 

The above implies that either Bj.2,;-2 = 1 and B;,;-2 = 0, or Bj+2,;-2 = 0 and 
B;,i-2 = 1. But, since Bj; = 1 and B is a permutation matrix, B;,;-2 must be zero 
and Bj42,i-2 =I; 

Continuing similarly, we get Bj4,;-, = 1 for every r with 3<r<k— j. In 
general, By j4j;-~ = 1. 


: : i+] 
Wheni + / is anevenintegerandk = ital: » By i4j-k = 1 becomes Bis; ix) = 1, 


ii ity 
ie Tae? 
if S ; ae : itjtl 
which is not possible. Also, when i + j + 1 is even, and k = ————.,, Bji+4j-z = 
1 becomes Bisin A= 1, which is not possible since the vertices vi+j+1 and 
2 


Visi+1_, are sdiacenti in G, and H is a subgraph of G. 

Therefore By. ii-1 = land Bj41;-1 = 0. 

Using the fact that B is a permutation matrix, one can show that B;_;,j-1 = 
1, Bj-2, j-2 => 1, Bova By j-isi => 1, and Bist, j4i => 1, Bi+2, 542 => 1, Suet Bisn—jn 
=> 1, and Bitn—j+i a i 

We have By, j;~-i41 = 1 and Bj4n—j+1,1 = 1. These two together imply that 


J-itl=itn—-j+l, 
a ee 
Paha fi ae 
=.—+ifori= = 
2 2s 
Thus 7 is a 1-factor graph with v; ~q v24;,1 <i < 5 and 


2 
‘ / / - 
vertices VU}, Uz, ---, Ui»: 


Conversely, let H be a subgraph of G = Ky, U C,, satisfying the conditions of the 


theorem. Then 
Bios Onxm 
A(H) = 0,.., D! 


mxm 
on the vertices vj, v2,..., U,, in which the ith row has the non-zero entry in the 
(5 + i)th column, 1 <i < 5. The matrix D7, ,.,,, is the adjacency matrix of a 1-factor 
graph on the vertices v}, ae nae. 2 


Then, it is easy to observe that A(G) A(H) is asymmetric, (0, 1) matrix with zero 
diagonal. Hence A(G)A(/Z) is graphical. 


5 and a 1-factor induced by 


xn 18S the adjacency matrix of a 1-factor graph 


) , where B/ 


17.5 Realization of A(G)A(G?) 


In this section, we introduce the notion of perfect matching property for a k-partition 
of the vertex set of a given graph. We consider non-trivial graphs G and G; , the 
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k-complement of G with respect to a k-partition of V(G), to prove that A(G)A(G ) 
is realizable as a graph if and only if P has the perfect matching property. Also, 
we discuss the characterization of complete k-partite graphs Ky, .,,,...n, that have a 
commuting decomposition into a perfect matching and its k-complement. The notion 
of k-complement of a graph G with respect to a partition P of V(G) with k parts is 
a generalization of the concept of the complement of a graph G introduced by [8]. 


Definition 17.10 [8] Given a graph G and a k-partition P = {V,, Vo,..., Ve} of 
V(G), the k-complement of G with respect to the partition P is a graph H with 
V (#1) = V(G) in which the adjacency between v € V; and w € V; is defined by 


.- | v%G w, fori j 
vu~y wif & : cae! 
v~cG w, fori = j. 


Definition 17.11 A collection of subgraphs H,, H,..., Hy, of G is said to be a 
k 
decomposition of G if ) Hj = G and E(H;) 0 E(H,;) = ® fori # j. 


i=1 


The above graph #7 is usually denoted by G ? and is also known as a generalized 
complement of graph G, where it reduces to the well-known complement of G when 
k is equal to n, the number of vertices in the graph G. From the definition, it is clear 
that the subgraph induced by V;, | <i <k,inGandG a are the same. 


Definition 17.12 A decomposition H,, H),..., H, of G is said to be a commuting 
decomposition of G if H, Hz, ..., Hy are spanning subgraphs and A(H;)A(H;) = 
A(A;)A(H;) for alli and j with 1 <i, j <k. 


Theorem 17.22 Let G be a graph, and P = {V,, V2, ..., Ve} be a k-partition of 
V(G). The matrix product A(G)A(G{) is graphical if and only if the following 
Statements are true. 


(i) The subgraph (V;) is totally disconnected for every i, 1 <i <k. 
(ii) For every pair of vertices u € V; and v € Vj, there exists at most one vertex 
weéev,,r Ai, j and1 <i, j,r <k, such thatu ~g w andw ~& v. 
(iii) For a pair of vertices u and v, if there exists w satisfying (ii), then there exists 
a unique w' € V;,1 #1, j, such thatv ~gG w' and w' ~%& u. 


Proof Let A(G)A(GP) be graphical. If vu ~g v for any u,v € V;, then from the 
definition of G{ we see that u ~@p v. This contradicts the fact that G i isasubgraph 
of G. This proves (i). 

From (ii) of Theorem 17.1, we know that for every pair of vertices u and v, there 
exists a vertex w different from u and v such that u ~G w and w ~@p v. Since Gi 


is a subgraph of G, it is trivial that w Gg v. Further, if w € V, for some r, then from 
(i), it follows that r is neither equal to i nor equal to 7. Thus (ii) is proved. 

Proof of (iii) follows immediately from (iii) of Theorem 17.1. 

Conversely, let (i), (ii), and (iii) of the theorem be satisfied. Consider any two 
vertices u € V; and v € V;. Since (V;) and (Vj) are totally disconnected in G, from 
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the definition of G. and from (ii), it follows that there is at most one G-G 4 path 
between u and v. For the same reason, (iii) implies that there exists at most one 
G;.-G path between u and v whenever there exists one G-G/’ path between u and 
v. Now, referring to Theorem 17.1, we see that A(G)A(G}) is graphical. 


Example 5 


Consider the graph G given in Fig. 17.14, and its 4-complement G/ with respect to 
the partition P = {V,, V2, V3, V4} as shown. It is clear that (V;), (V2), (V3), and (V4) 
are totally disconnected and also satisfy conditions (ii) and (iii) of Theorem 17.22. 
Further, by direct computation of A(G)A(GP ), also given below, we observe that 
the product is graphical. The graphs G, G/’, and their matrix product I are given in 
Fig. 17.14. 


00010000 00001111 01100011 
00001000 00010111 10100011 
00000100 00011011 11000011 
10000000 P 01100011 P 00001111 

AB = 91900000464 =|1010001 1/4464 =|o00101111: 
00100000 11000011 00011011 
00000001 11111100 11111100 
loooo0010 liiiiiio0ol liiiiiio0ol 


Let G be a graph on n vertices and P = {V,, Vo,..., V,} be a partition of V(G) 
of size n. Then every part is a singleton set and the n-complement G? is the same as 
the actual complement G of G. Then, by Theorem 17.8, A(G)A(GP) = A(G)A(G) 
is graphical if and only if G is either a 1-factor graph or an (n — 2)-regular graph, or 
else Cs, the cycle on five vertices. So, while characterizing graph G and partition P 
of V(G) satisfying the property that A(G)A(G{) is graphical, we consider k £ n. 


Definition 17.13 Given a graph G, a k-partition P = {V,, V2,..., Ve} of V(G) is 
said to have perfect matching property in G if V; is an independent set for every /, 
1 <i <k, and for every V; there exists exactly one V;,1 <i, j < kandi A j, such 
that the graph induced by the set V; U V; is a perfect matching. In such a case, V; 
and V; are said to be the matching partite sets. 


Lemma 17.7 Let G be a 1-factor graph and let P = {V\, V2,..., Ve} be a k- 
partition of V(G) (k <n). Then A(G)A(GP) is graphical if and only if the partition 
P has the perfect matching property. 


Proof Let G be a 1-factor graph, and suppose that A(G) A(G[’) be graphical. Since 
k &n, there exists at least one V; with two or more vertices. Consider any two 
vertices u and v from V;, 1 <i <k. Since A(G)A(GP) is graphical, from (i) of 
Theorem 17.22, we know that (V;) is totally disconnected and therefore u ~@ v. 
Since G is a 1-factor graph, there exist distinct vertices w; and w2 such thatu ~g w 
and v ~gG wW, and further, w; ~G w>. 

Now, to prove that P has perfect matching property, it is enough if we prove that 
both w, and wy are in the same partite set in P. Suppose, to the contrary, that w, and 
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(a) G (b) Gt" 


(c) I 


Fig. 17.14 Graphs G, Gi) , and I’ with respect to partition P of V(G) 


Ww? are in different partite sets, say w; € V, and w2 € V,,r #:s. From the definition 
of Ge we observe that there exists a G-G} path u ~g wy “GP We. But there is no 
G; -G path between u and wy, as v is the only vertex adjacent to w2 in G and u and v 
remain non-adjacent in Gj’. This implies that A(G) A(GP) is not graphical, which is 
a contradiction. Therefore, w; and wz must be in the same partite set, say V,. Thus, 
(V; U V,) is a perfect matching and therefore P has perfect matching property. 
Conversely, let the partition P = {V,, V2,..., Vk} of V(G) have perfect matching 
property. Then, each partite set V; is an independent set in the graph G and therefore, 
from the definition of G7, we see that G? is a subgraph of G. Since G is a 1-factor 
graph, there exists at most one G-Gj. path between any two vertices. Now, let there 
be a G-G{ path between the vertices u and v given by u ~g x ~qp v. Letue Vj, 
v € V;, and x € V,. Since P has perfect matching property, we have r ¥ i, and V; 
and V, are matching partite sets. Also, r # j as V; is an independent set in the graph 
G as well as in the graph G/’. Now for y ~g v, it is clear that y is not in V;, as V; and 


17 On Products of Graph Matrices 371 


V; are not matching partite sets. So, y is not adjacent to u in G and therefore we obtain 
a Gi -G path between u and v, given by u ~gp y ~G v. Hence, by Theorem 17.1, 


A(G)A(G}) is graphical. 


Theorem 17.23 Let G be a graph on n vertices and P = {V,, V2, ..., Ve} be a k- 
partition of V(G) with k An. Then A(G)A(G}) is graphical if and only if one of 
the following holds. 


(i) Either G or Gf is a null graph, in which case the other one is a complete 
k-partite graph. 
(ii) The partition P has perfect matching property either in G or in Gi. 


Proof Let A(G)A(G7) be graphical, in which case each of the parts is an indepen- 
dent set in G. Therefore, from the definition of k-complement of G, it is trivial that G 
is a null graph (respectively complete k-partite graph) if and only if Gf is a complete 
k-partite graph (respectively null graph). In this case (i) holds. 

Suppose that G is neither the null graph nor a complete k-partite graph. Consider 
a pair of vertices u and v from one part of the partition, say Vj, which is possible 
as k £n. Since A(G)A(G{) is graphical, by Theorem 17.22, we have one of the 
following cases: 


(a) Forall w € Vi, u ~G w and w ~g v (equivalently, u ~GP w and w ae v). 
(b) There exists a unique r ¢ V; such that u ~g r andr ~GP v; in which case G 
has a vertex s such that u ~gp s ands ~g v. 


Case (a): For any x ¢ Vj, it is not possible that both u ~G» x and x ~@gp v, for 
otherwise there would be two G-G; paths between w and x. So, for every x ¢ Vi, 
x is adjacent to both of u and v in G. This implies that degg? u = 0. By Note 17.2, 
deggr u= deg gr x = Oforallx ¢ V). Further, from the definition of k-complement 
of G and the fact that V; is an independent set in G, it follows that deggr v=0, 
since deggr x = 0 forall x ¢ V. Hence, G? is the null graph, which is not the case 
under discussion. 

Case (b): Without loss of generality, let there exist a unique r ¢ V; such that 
u~gr andr ~gp v, in which case there exists a vertex s in G such that u ~gp s 
and s ~g v. For any t €r,s andt ¢ Vj, both u and v are simultaneously adjacent 
or simultaneously non-adjacent to t in G. Also, note that there is no pair of vertices 
p and q such that p is adjacent to both u and v in G, and gq is non-adjacent to 
both wu and v in G. Otherwise, there would be two G-G/ paths (p ~g u ~ep q and 
P ~G UV ~Gp q) between p and q. 

If both the vertices uv and v are not adjacent tot ¢ V; andt # r,s, thendeggu = 
deg, v = 1. As u and v are adjacent to f in Gf, from Note 17.2, we have deg, t = 
deg, u = deg, v = | for allt ¢ V;. Since r and v are adjacent to each other in G/, 
deg, r = 1. Forallz € V| andz # u, we have r ~Gp z, and by Note 17.2, it implies 
that deggz = 1. 

Similarly, in case both the vertices u and v are adjacent to t ¢ V; andt ¢r,s, 
we prove that G/ is 1-factor graph. So in the case (ii), either G or G} is 1-factor 
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graph. Therefore, by Lemma 17.7, it follows that the partition P satisfies the perfect 
matching property. 

The converse part is trivial if G and Gf’ satisfy the condition (i) given in the 
theorem. If G and Gj’ satisfy the condition (ii), the converse part follows from 
Lemma 17.7, as graph G with partition P satisfying perfect matching property is 
trivially a 1-factor graph. 


Next, we obtain a commuting decomposition of a complete k-partite graph G into 
a perfect matching and its k-complement with respect to the k-partition of G. First 
we note the following. 


Note 17.14 From Theorem 17.1, one can conclude that the (i, j) entry of A(G) A(H) 
represents number of G-H paths from v; to v;. Hence, G and H commute with each 
other if and only if for every two vertices v; and v;,i # j, 1 <i, j <n, the number 
of G-H paths from 1; to v; is same as the number of H-G paths from v; to v;. 


Theorem 17.24 explains the commuting decomposition of Ky, .n,,...n, into a perfect 
matching and its k-complement. We consider the partition P to be the k-partition of 
the complete k-partite graph Kn, .n,,....nj- 


Theorem 17.24 Let G be a complete k-partite graph Kn, .n,,....n, Onn vertices with 
the k-partition of V(G) given by P = {V,, V2,..., Ve} and let |V;| =nj, 1 <i <k. 
Then G is decomposable into two commuting subgraphs, one of which is a perfect 
matching of G, say H, and the other one is the k-complement of H, with respect to 
the same partition P = {V,, V2,..., Ve}, if and only if the number of partite sets in 
P with the same cardinality n; is even for everyi, 1 <i <k. 


Proof Consider the complete k-partite graph G = Kn, n,,....n,- Suppose that G has a 
commuting decomposition into two subgraphs Hj; and Hp, one of which, say Hj, isa 
perfect matching of G and the other one is given by Hz = (4) - , the k-complement 
of H, with respect to the partition P = {V;, Vo,..., Ve}. 

Since G has a perfect matching H, the number of vertices in G is even. If all the 
partite sets are singletons, then k = n is also even and the assertion of the theorem 
is obviously true. 

Suppose that there is a partite set, say V;, 1 <i <k, which has two or more 
vertices. Letu € Vj andv € V; withu ~y, v, 1<i,j <k,i¥¢ j. Letw € V; and 
let w ~y, x. If x € V,,¢ A j, then there is a H,-H> path from w to v but no Ap- 
H, path from w to v which by Note 17.14 contradicts the assumption that H, and 
Hy commute with each other. Hence, if ue V; andve V;,1 <i, j <k,i Fj, if 
u ~y, v, then for each vertex in V; the only adjacent vertex of Hj is also in V;. The 
same is true for V; which implies that |V;| = |V;|. Similarly, partite sets with the 
same cardinality are paired and since Hj is a perfect matching, the number of partite 
sets with cardinality n; is even for everyi, | <i <k. 

Conversely, assume that the number of partite sets with the same cardinality is 
even. Then we can pair the partite sets having the same cardinality. Without loss of 
generality, assume that |V;| = |V2] = 71, |V3| = |Va| = 72, ..-, |V2--1] = |V2-| = 
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n,, Where 2r = k. Observe that n; may be equal ton; for some i and j,1 <i, j < 
r,i A j. Since foreveryi, 1 <i <r, (V2;-1 U V2;) isa complete bipartite graph with 
|V>;-1| = | V2; |, there is a perfect matching in the induced graph. The union of all these 
perfect matching gives a perfect matching in G, denoted by H. In the graph G\ M1, 
obtained by deleting the edges of H; from G, two vertices are adjacent if and only if 
they are in two different partite sets and they are not adjacent in H,. Hence, G\ H; is 
same as (H)p which implies that G = H, U (A)p and E(M)N E((A)p) =O. 
Thus H; and Hy = (H;);? form a decomposition of G. To show that H; and (H))} 
form a commuting decomposition of G, we show that for any two vertices u and v 
in G, the number of H\-H> path from u to v is same as the number of H-H path 
from u to v. 

Consider u, v in the same partite set V;. Let u ~y, u’ and v ~y, v’. Then, u ~y, 
u’ ~y, v is the only H,-H) path andu ~ y, v' ~y, v is the only H)-H path from u 
to v. 

Consider u € V; and let uw’ € V; with wu ~y, u’. For any other vertex x in Vj, 
there neither exists a H)-H> path nor exists a H7-H, path from u to x. Consider a 
vertexx € V1 Ai, j, withx ~y, x’. Thenu ~y, u' ~y, x is the only H\-H)» path 
and u ~y, x’ ~y, x is the only H>-H path from u to x. Hence the decomposition 
{H,, Hy = (H1)P} of G is a commuting decomposition. 


Note 17.15 Let G be a complete k-partite graph K,,, n,n, With respect to a par- 
tition P = {V,, Vo,..., Vi} of V(G). Then the graph G is decomposable into two 
commuting perfect matchings—i.e., both H, and Hy = (H,)} are perfect matchings 
only when n = 4 and k = 2. Equivalently, K2,2 is the only complete k-partite graph, 
with k = 2 which is decomposable into two commuting perfect matchings as above. 


Note 17.16 Let G be a complete k-partite graph Ky, .n,,...n, With respect to a par- 
tition P = {Vj, Vo,..., Vi} of V(G). Then the graph G is decomposable into two 
commuting subgraphs, one of which is a perfect matching and the other one is a 
Hamiltonian cycle, i.e., H; is a perfect matching and H, = (M)p is a Hamilto- 
nian cycle when either n = 6 andk = 2 orn = 4 and k = 4. Equivalently, K33 and 
Ki.1,1,1 are the only complete k-partite graphs, with k = 2 and k = 4 respectively, 
which are decomposable into two commuting subgraphs, one of them being a perfect 
matching and the other one a Hamiltonian cycle. 


17.6 Companion Search Algorithm 


This section contains the companion search algorithm which, for a given graph G, 
constructs a non-trivial companion H of G whenever such a companion exists, and 
returns H = K,, otherwise. It also returns the matrix product I of G and H. 

We know that whenever there is a G-H path from a vertex u to a vertex v, it results 
in an edge in I’ only when there also exists an H-G path from u to v. For the sake 
of convenience, we say that there is an arc (i.e., a directed edge) from u to v in T 
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whenever there is a G-H path from u to v. When there is an arc from u to v as well 
as an arc from v to u in T’, there is a symmetric pair of arcs between u and v inT, 1.e., 
there is an (undirected) edge between u and v in I’. If there is a G-H path, but not 
an H-G path, from u to v, then we say that the arc from u to v inT’ is asymmetric. 

The algorithm makes use of the results of Theorem 17.1. We assume that 
the input graph G contains no isolated vertex and that the vertex set of G is 
V(G) = {0,1,...,2— 1}. As G and H are edge disjoint, the algorithm investi- 
gates the possibility of each edge in G to be an edge in H. So, it maintains a list of 
possible edges of H, E(#), and a corresponding list of possible edges of , E(1). 

The algorithm takes as input the adjacency list of the graph G. The graphs H and 
I are initialized to have empty adjacency lists, that is, the adjacency list of K,. The 
algorithm starts at a root vertex, say 0. It searches for the first edge (0, j) that is not 
an edge of G, out of all possible edges (0, 1), (0, 2), ..., (0, 7 — 1) in lexicographic 
order. If no such edge exists, then no non-trivial companion exists for G, and the 
algorithm returns H = K, and Pr = K,, as the output. If such an edge exists but 
is already included in the adjacency list of H, then the algorithm goes to the next 
possible selection of H edge. 

Once any edge, say (i, j'), is added to the adjacency list of H, then the algorithm 
will add edges to the adjacency list of [ corresponding to all the G-H paths involving 
the edge (i, j) of H. Thatis, E(#) is updated to E(#) U {(, j)} and E (1) is updated 
to EP) U{(,) |r € No} UL A) 1k € NeW}: 

Next, the algorithm starts searching for the first asymmetric arc in E(I"). Suppose 
that there is an asymmetric arc from u to v in I’. Now, to find an arc from v to u, 
the algorithm adds the first feasible edge (x, u), in lexicographic order, to E(#), 
x € Ng(v). If no such feasible edge exists, then the algorithm goes to the next 
possible choice of edge for H, and otherwise it adds that edge to the adjacency list of 
H. Then it repeats the above steps until there are no more asymmetric arcs in E (I). 
If there is no feasible edge for H, then it will come to the root vertex and select the 
next feasible edge. If there is no such asymmetric arc in I, then it returns the first 
non-null pair (H, I’) as output or returns (Ky, Kn) as output. 

Note that in the algorithm, 


Wr) = {v € V(G) | there exists w € V(G) withu ~g w ~qy v}. 


17.7 Conclusion and Open Problems 


For the adjacency matrix A(G), the nth power has many nice properties. But not even 
((A(G))* = A(G)A(G) is graphical. We considered the problem of searching for a 
graph H with the property that A(G)A(#) is graphical. In the continuation of our 
study, we obtained many results and some new characterizations of certain classes 
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Algorithm 3 

1: function FINDHI'(G) 

2: n:= |V(G)| 

3: Ho := Kn 

4: To := Kn 

5: for j € {1,...,n—1}do 

6: (H, T°) := ADDEDGE(G, Ho, Po, j, (0, 7)) 

ie if H A K, then > Nontrivial companion obtained 
8: return (H, I) 

9: end if 

10: end for > No non-trivial companion graph found 


11: return (Ho, 9) 
12: end function 


of graphs in terms of the product of graph matrices. A few of these are mentioned 
below. 

For a graph G on n (= 4) vertices, G is a cycle if and only if A(G) B(G) is an 
n X n matrix in which every column and every row contains exactly four Is. 

For a graph on n (> 3) vertices and m edges, G is a complete bipartite graph if 
and only if A(G) B(G) is ann x m matrix in which all entries are 1. 

The graph K>,2 is the only complete k-partite graph with k = 2 which is decom- 
posable into two commuting perfect matchings. Also, K33 and K,,1,1,; are the only 
complete k-partite graphs, with k = 2 and k = 4 respectively, which are decompos- 
able into two commuting subgraphs, one of which is a perfect matching and the other 
one a Hamiltonian cycle. 

The following problems could be a goal for future work. 


(i) There is no closed-form formula for the number of edges in A(G) A (#7) in terms 
of the number of edges in G and the number of edges in H. One can attempt to 
find such a formula. 

(ii) If A(G)A(#) is graphical then H is a subgraph of G. A few edges of G are to 
be removed to get H unless H = G. One may try to characterize such edges. 

(iii) We know that trees and P’””,2 <m <n -— 1, have no non-trivial companions. 
One can find some more classes in which no graph has a non-trivial companion. 

(iv) One may try to characterize the realizability of products of other graph matrices 
such as the distance matrix or Laplacian matrix, and explore the structural 
property of the realizing graph. 

(v) Onecan study the relation between the distances in G, H, and the realizing graph 
I’, and hence study the relation between some stress and associated properties 
of these graphs. 

(vi) One may try to improve the companion search algorithm by making use of some 
properties of the graph factors. 
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1: function ADDEDGE(G, H, TI, z, (i, j)) 

2 n:=|V(G)| 

3 if @ =O and j < z) or(j = Oandi < z) then 
4 return (Ky, Kn) 

5: else if i = 7 then 

6: return (Ky, Kn) 

7 else if j € Ng(i) then 

8: return (Ky, Kn) 

9: else 


10: E(H) := E(H) U{(i, jf)} 

11: for k € Ng(i) do 

12: if (k, 7) € E(L) then 

13: E(T) := EWM) U{(k, f)} 
14: else 

15: return (Ky, Kn) 

16: end if 

17: end for 

18: for k € Nc(j) do 

19: if (k,i) ¢ E() then 

20: E(L) := E(r) U {(k, i)} 
21: else 

22: return (Ky, Kn) 

23: end if 

24: end for 

25: for u € V(T) do 

26: for v € Nr(u) do 

27: if u ¢ Nr(v) then 

28: for x € NG(v) do 
29: (H’,T’) := ADDEDGE(G, H,T, z, (x, u)) 
30: if H’ # K, then 
31: return (H’, I’) 
32: else 

33: continue 

34: end if 

3): end for 

36: return (K,, Kn) 

37: else 

38: continue 

39: end if 

40: end for 

Al: end for 

42: return (H,T) 

43: end if 


44: end function 


G. Sudhakara et al. 


> Asymmetric arc 


> Nontrivial companion 


> Check next v 


> No asymmetric arcs found 
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18.1 Historical Background 


Calyampudi Radhakrishna Rao, popularly known as “C. R. Rao” (henceforth, Rao), 
was born on September 10, 1920, in the Madras Presidency of British India. He 
is widely regarded as a “living legend” in the field of statistics, and known for 
Cramér-Rao Bound, Rao-Blackwell Theorem, Rao Score Test, Fisher-Rao Distance, 
Generalized Inverse, Quadratic Entropy, and Orthogonal Arrays, to name only a few 
among his numerous path-breaking contributions to the subject. In comparison, the 
topic of weighted distributions, as formulated by Rao in 1963, is probably less well 
known. Yet, it has led to great advances in the subsequent decades both in theoretical 
as well as applied areas of statistics; and curiously, in the arc of Rao’s remarkably long 
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career, as the present study suggests, this particular topic perhaps made a uniquely 
historic contribution. 

In 1911-1914, Henry Wellcome, the British pharmaceutical entrepreneur, led, 
over four seasons, an archeological expedition at the Jebel Moya site in the southern 
Gezira plain of Sudan [4]. It is the site of the largest pastoralist cemetery in Northeast 
Africa, currently dated to 5000-500 BCE [9]. The excavated skeletal remains were 
shipped to London. Here, these were warehoused poorly and, in fact, even suffered 
from flooding. After the death of Wellcome in 1936, and by the end of World Wars 
I and II, the remains and the excavation records were transferred to the Duckworth 
Laboratory in the University Museum of Archaeology and Ethnology at Cambridge 
University under the curatorship of J. C. Trevor. 

In March 1946, P. C. Mahalanobis at the Indian Statistical Institute (ISI) in Kolkata 
received a telegram from Trevor asking him to send someone to help with the anthro- 
pomorphic analysis of the collection. Incidentally, Mahalanobis had assigned to Rao 
a project on the analysis of an anthropometric survey, which used the D? distance for 
grouping of Indian populations [23]. Thus, Rao, along with Ramkrishna Mukherjee, 
an anthropologist, went to England in August of 1946 to work as visiting scholars 
at the Cambridge University Museum of Archaeology and Anthropology. The aim 
of their study was “to undertake laboratory examination of identifiable and usable 
adult specimens, to analyze the measurements and observations of the field physical 
anthropologists, and to determine the relationship between the Jebel Moya inhabi- 
tants and other African peoples” [9]. 

The long overdue report of this historic project was finally published owing to 
Mukherjee, Rao, and Trevor in 1955 [27]. Not surprisingly, the study posed certain 
technical challenges. In particular, the 40-year hiatus between the excavation and the 
anthropometric data analysis had proved to be catastrophic for the remains, which, 
according to Trevor, had “disintegrated beyond hope of repair’ [9]. Out of more than 
3,000 skeletal parts that were originally excavated, only 98 crania, 139 mandibles, 
and a few post-cranial elements had survived intact. 

Rao’s task was to estimate the unknown mean cranial capacity and other features of 
the original Jebel Moya population from the damaged remains, many of which had 
measurements missing due to damage. Toward maximum likelihood estimation, one 
could write the likelihood function using a multivariate (normal) distribution based 
on the samples with complete measurements, and the derived marginal distribution 
for those with incomplete sets of measurements. However, such estimation is based 
on the assumption that each skull can be considered to belong to a random sample 
from the original population of skulls. The key question would then be: was such an 
assumption valid? 

While looking at the samples that survived, Rao (Fig. 18.6) made a curious obser- 
vation, “are only small skulls preserved?” [38]. If w(c) is the probability that a skull 
of capacity c is unbroken, then in archeological recovery, i.e., during the data gather- 
ing process, it is known that w(c) is a decreasing function of c. The larger a skull, the 
greater is its chance of being damaged upon burial and recovery, as Rao noted in his 
study of yet another collection [41]. This will lead to a larger representation of small 
skulls among the unbroken cranial remains, and, therefore, the mean of the available 
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measurements of the corresponding random variable C will be an underestimate of 
the mean cranial capacity of the original population. 

Alongside this full-time job, Rao, also a student at King’s College, Cambridge, 
was “suggested” by his Ph.D. advisor R. A. Fisher to work simultaneously at 
the latter’s genetics laboratory. Fisher had long been interested in the concept of 
“ascertainment’”—a mode of sampling that depended on the outcome which one 
wanted to analyze as a dependent variable. In a classical paper, Fisher studied how 
the methods of ascertainment can influence the form of the distribution of recorded 
observations [12]. One can assume a model that has been adjusted for ascertain- 
ment will estimate parameters in the general population from which the sample was 
drawn. In usual statistical practice, it is generally assumed that a random sample 
from a population to be studied can be observed in data. Not surprisingly, therefore, 
the key specification of “what population does a sample represent?” might be taken 
for granted. 

However, in many situations, obtaining a random sample may be practically too 
difficult, or too costly, or indeed, even less preferable to the non-random data that are 
actually available, e.g., from field observations or non-experimental data or a survey 
lacking a suitable sampling frame, as in many “big data” scenarios. Sometimes, the 
events may be observed only in modified form, e.g., in damage models. Indeed, 
certain events may be rendered unobservable by the adoption of the very method 
that is used for making observations, and are, therefore, missed unwittingly. Or they 
might be observable but only with certain probabilities or weights that may depend 
on sample-specific characteristics, such as their inherent conspicuousness, as well 
as other unknown parameters. 

Fisher’s problem of ascertainment was formally addressed by Rao when, during 
August 15-20, 1963, G. P. Patil organized the “First International Symposium on 
Classical and Contagious Distributions” at McGill University in Montreal. Patil rec- 
ollected in the symposium proceedings that Rao had advised him to study discrete 
distributions when he visited ISI a decade earlier. At a second-day session chaired 
by Jerzy Neyman, Rao presented the first paper that formulated and unified weighted 
(discrete) distributions in general terms, which appeared in the proceedings edited 
by Patil [34], and was later reprinted in Sankhya [35]. 


18.2 Weighted Distributions 


Let X be a random variable with probability density function (pdf) g(x; @) with 
parameters 0. The traditional statistical analysis assumes that an identically (as well 
as independently) distributed random sample X,,..., X, can always be observed. 
Weighted distributions arise when X = x enters the sample with a non-zero weight 
w(x, a) that depends on the observed value x and possibly also on some unknown 
parameter a. Then x is not an observation on X but on the resulting random variable 
X" which has the weighted pdf: 
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fie w(x, a)- g(x; ey 
) 
The denominator w is the normalizing factor chosen to be E[w(X; @)] so that 
Ff (x; 6, a) integrates to 1. The weight w(x, @) could be any non-negative function for 
which E[w(X; a)] exists. When x is univariate and non-negative, then the weighted 
distribution 


f(:0) = RO 
(xX) 

for w(x, a) = x is called size-biased; and length-biased for w(x, a) = |x| where |x| 
is some measure of “length” of x. Many length-biased distributions could be shown 
to belong to the same family as their unweighted versions [34]. Size-biased sampling 
and related form-invariant weighted distributions were studied by Patil and Ord [28]. 
Patil and Rao [30] studied size-biased sampling with various applications to wildlife 
populations and human families. 

Weighted distributions are useful for modulating the probabilities of the events as 
they are observed by means of collecting data possibly under less-than-perfect con- 
ditions. In this regard, prominent application areas of weighted distributions include 
truncation and censoring, damage models, size-biased sampling, nonresponse in 
surveys, “file-drawer” problem in meta-analysis, etc. Even mixtures of distributions 
can be shown to belong to this general formulation [22]. Asymmetric data bias is 
common in many scenarios that may have serious practical implications such as for 
unemployment statistics which fail to account for the discouraged workers who have 
stopped looking for jobs during a recession [46]. Interestingly, sometimes biased 
samples available from observational studies may contain more (Fisher) information 
than their randomly drawn counterparts [7]. Rao’s studies, e.g., Rao and Rubin [40], 
Patil and Rao [29], and Rao [37] further advanced the area, which he surveyed during 
his post-ISI years as the University Professor at the University of Pittsburgh [38, 39]. 
For a more recent review of weighted distributions, see Saghir et al. [42]. 

As an example of popular special cases of weighted distributions, we consider 
the well-known approach of Azzalini [6] that allows us to adjust the shape of and 
introduce skewness to a symmetric distribution, say, with pdf g and cumulative 
distribution function (cdf) G. Thus, a random variable X € R is said to have a skew 
distribution if its pdf is given by 


fx (x) = 2g(x)G(ax), 


based on a shape parameter, a, which helps in adjusting g to account for the asym- 
metry observed in the data, which are, thereby, better represented by /. In general, a 
p-variate Generalized Skew Elliptical (GSE) distribution of random variable X € 
R? has pdf of the form 


2g (x; §; Q)m(x — §), 
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where g is an elliptically contoured pdf with location parameter &, scale matrix Q, 
and skewing function 2. An example of a GSE pdf due to Branco and Dey [8] is 
2 

Ja 


which could be formulated as a weighted distribution as follows: 


g(Q73 (x — £))- m(Q77 (x — &)), 


g(Q"?x) 


f@5 0) = = 
$2]? 


’ 


w(x) = (Q72x), 


E[w(X)] = ; (assuming € = 0). 


Here, the weight function w distorts the elliptical (symmetric) contours of f through 
the presence of asymmetric “outliers” in the observed sample due to GSE [15]. 

Azzalini’s approach could be also extended to distributions that are asymmetric. 
Gupta and Kundu [16] introduced the new class of weighted exponential (WE) dis- 
tribution by applying Azzalini’s method to the exponential distribution. Let U and 
V be two non-negative continuous independent random variables with pdfs f and g 
and cdfs F and G, respectively. Then for a shape parameter a > 0, the pdf of random 
variable X = U if V < aU is given by 


_ f(x)G(ax) 
=> —, — , xX 


Fx (x) > 0, 


where w = de f(x)G(ax) dx and0 < @ < oo. When U and V follow an exponen- 
tial distribution with mean 1 /A, arandom variable X is said to have a WE distribution, 
denoted by W E(aq, 4), if its pdf is given by 


1 
Ria ea oe Ss HS, 
Qa 


Here, a and A are the shape and scale parameters respectively [24]. 

As shown by Gupta and Kundu [16], WE distribution possesses useful likelihood 
ratio properties and all its moments can be calculated explicitly. The shape parameter 
governs the shape of the pdf of the WE distribution, which is similar to those of the 
Weibull, gamma, or generalized exponential distributions. This makes WE suitable 
for analyzing positively skewed data. While the exponential distribution is among 
the most popular models used in reliability engineering and life data analysis, when 
wearing out or aging effects are present, the WE distribution could be very effective. 
In fact, it can be viewed as a hidden truncation model, as was also observed earlier 
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in the case of the skew-normal distribution [5]. Further advances based on the WE 
distribution can be found in recent reviews, e.g., Tomy et al. [44]. 


18.3 Weighted Systems for Data Fusion 


Recent studies have extended the concept of weighted distributions to a weighted 
system of distributions. Using this general approach, we developed a computa- 
tional framework for modeling the combined dynamics of environmental variables 
as observed by multiple sources, e.g., spatially distributed multivariate data streams, 
automated sensors, and surveillance networks [48, 49]. Often, environmental moni- 
toring sites are not located randomly, the system may contain built-in redundancies, 
and could be difficult to maintain without failures. Therefore, by systematic fusion of 
a moderate number of nearby data sources, we can increase the system’s predictive 
capability with a combined model. 

We assume a reference event distribution go, and its possible distortions or “tilt”- 
ed forms g1,..., 8m due to m different sources. This gives us a weighted system of 
distributions: 


81(X) =w1 (x) - go(x) 


8m (x) =Wn (x) go(x). 


Assume that we have data from each of go, 21,.--, 8m. Then, the relationship 
between a baseline distribution and its distortions or tilts w,,k = 1,...m allows us 
to do inference on the system’s overall behavior based on the fusion of data observed 
from multiple, and possibly dependent, sources. In a weighted system, this is realized 
by a model for multiple distributions g; to be simultaneously “regressed” against a 
common reference distribution go, thus letting our data fusion approach estimate the 
model parameters, including the densities, by using all data sources and not just the 
reference sample. 

It was noted by Vardi [45] that data from one distribution could form a biased 
sample from another, and this could be modeled by the ratio of their densities. In 
recent years, the original idea of the density ratio model (DRM) has been developed 
into a powerful tool, e.g., Qin [31], Fokianos [13]. In particular, it has been found 
ideally suited for pooling information from multiple populations by Chen and Liu 
[11], Zhang and Chen [47], and Zhang et al. [49]. Other applications include esti- 
mating tail probabilities of exceeding a pre-specified event threshold, especially with 
few observations of extreme events, e.g., de Carvalho and Davison [10], Zhang et 
al. [48, 50], Kedem et al. [20]. In general terms, if an environmental variable X of 
interest has only moderate amounts of data from any given source, then, based on 
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such data, what is the probability of any observation of X exceeding a high threshold 
T? 

To address this problem, we developed a data fusion approach that pools infor- 
mation across multiple sources using a multi-sample DRM given by 


&k(X) 
g0(Xx) 


= exp(Qk + Bi hy (x)); k=1,...,m, 


where go represents the density of X (say, the level of a pollutant measured at a mon- 
itoring site) and g1,..., g» represent those at its m nearby sites. Instead of making 
parametric assumptions on each of these densities, DRM captures the parametric 
structure of their ratios with a common model. To prevent bias, large standard errors, 
and loss of power in inference, it is important to appropriately select the multi- 
dimensional distortion or tilt functions h;,’s. The choice of such tilt could allow 
flexibility in parameterization or even be data-driven as shown by Kedem et al. [19]. 
Recently, a functional principal component analysis approach was used to identify 
an effective approximation for a tilt function in low-dimensional functional subspace 
[47]. 

Let Xo, ..., X,, be the samples from the area of interest and its m nearby sites with 
sample sizes no, ..., 1m, respectively. The sample Xo is referred to as the reference 
sample; let G denote the corresponding reference cumulative distribution function 
(CDF). The fused sample is defined as t = (Xj,...X7,)", with size n = D779 nk. 
Zhang, Pyne, and Kedem [49] described inference based on the following empirical 
likelihood due to f: 


Lo, BG) =[],_ pi] [,_,[],_jexr (en + Bo aa Xs), 


where p; = d(G(t;)), and the estimates @, B, and hence p;’s are obtained by maxi- 
mizing the above likelihood with constraints: 


n 


pi=ls D0 pi explon + Beh) = 1,k = 1,...,m. 


i=l 


This gives the estimated reference CDF Git) = )°"_, pil lt; < t] and the asymp- 
totic result 


VG) — Gi) + NO,o(0), 
asn — oo. The expression of o (f) and other details regarding estimation and asymp- 


totic result can be found in our recent paper [48]. Based on the above result, a 95% 
confidence interval of the tail probability 1 — G(T) for a given threshold T is given 


by 
d=-G@- Z0.0254/ “o 1-G()+ 20.0254) 2). 
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Finally, an optimal choice of tilt function may be made based on a criterion that 
ensures better specification of the density ratio structure. For instance, such selection 
can be made using Akaike Information Criterion (AIC) given by —2/ogL (a, B, G)+ 
2q where q is the number of free parameters in the model [49]. Applications of 
the weighted systems framework in modeling augmented reality and estimation of 
exceedance probabilities can be found in our recent papers such as Kedem and Pyne 
[18], Kedem et al. [20], Zhang et al. [50]. 


18.4 Application: Wildfires and Air Pollution Data 


As described above, the capacity of introducing weighting mechanisms to adjust the 
shape or suitably “distort” the standard distributions to be able to model real-world 
data is a powerful tool. To monitor a dynamic phenomenon, modeling the change 
between the distributions of observations made before and after its onset is useful in 
many important applications such as inference of dynamic patterns as they emerge 
in data over space and/or time, e.g., environmental monitoring, statistical ecology, 
public health surveillance, and economics [21]. For describing the changes over 
time in the distribution of an environmentally important variable X, a propagation 
function (PF) models how the frequency of sampling units with the value X = x at 
one point in time must change in order to produce the distribution that occurs at a 
later point in time. Thus, PF provides a useful tool in long-term monitoring studies 
since all changes in a distribution can be examined together rather than just changes 
in single parameters such as the mean. 

Let X and Y be the values of an environmental variable on a population unit at 
two consecutive time-points, with the marginal densities fy(-) and gy(-) [17]. The 
PF is defined as 


_ 8y (x) 


amore 


This gives a weighted distribution: 
8y(x) = wx) - fx) 


where Ey[w] = E[w(X)] = 1. A similar concept of resource selection function 
(RSF) in wildlife habitat modeling was defined as a logistic discriminant function in 
terms of a ratio of pdf-s for used (f,,) and available (f,) resources in k habitats: 


Fux) 


Fa BO = exp(d._Bixi) 


where f; is the model coefficient for the habitat i’s covariate x; [25]. We extended 
this approach of connecting the log of the density ratio of a monitored variable with 
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a linear model (that includes site-specific covariates) via outcomes that we represent 
as flexibly shaped curves or functional data. 

In the era of climate change, the problem of wildfires, their usually complex 
dynamics, and their impact on human and environmental health via air pollution 
are of growing concern worldwide. For this study, a dataset on wildfires that took 
place over the past decade in the U.S. state of California (CA) was obtained from 
the website of the California Department of Forestry and Fire Protection [1]. From 
the dataset, we filtered minor events by selecting those with a duration of at least 
14days. Thus, we obtained a final list of n = 73 wildfires. For each wildfire, we 
used its spatial coordinates to assign a geographical Zone as per the California State 
Plane Coordinate System [2]. Here, a Zone is a label that represents a collection 
of contiguous counties in CA and takes values from 1 (northernmost) through 6 
(southernmost). As per NAD83 specification, Los Angeles county (previously Zone 
7) was assigned to Zone 5. 

We also used the location of each wildfire to identify its nearest surface air pollu- 
tion monitoring site from a nationwide network of stations of the U.S. Environmental 
Protection Agency (EPA) [3]. Datain the form of hourly measurements of PM2.5 level 
(i.e., fine particulate matter of width up to 2.5 zm) at the selected site, for 7 days right 
before (“week0’’) and after (“week1’’) the starting date of the wildfire, was obtained 
from EPA. If the number of missing values exceeded 25% of the PM2.5 data, then the 
next nearest monitoring site was considered. If none of the 3 nearest sites have suffi- 
cient data for the duration of a wildfire, then the event is discarded from the analysis. 
Otherwise, the distance (D;) between a wildfire i’s location and its selected monitor- 
ing site was calculated. We used this covariate to study the effects of “visibility” of any 
change in air pollution levels in terms of the distortion in the distributions of PM2.5 
data collected before and after the onset of wildfires in California. 

In the present study, we (1) applied density ratio to model the change in an envi- 
ronmental variable measured as the PM2.5 level in the air at the onset of a given 
wildfire, and (ii) gave a functional representation of the ratio of the density of this 
variable for week! to that of weekO, which serves as the baseline density. Thus, for 
each wildfire i in our dataset, we created a functional outcome variable that describes 
the log density ratio Y;(t) (Fig. 1) based on a variety of flexible shapes that could be 
used for this representation [43]. We defined t as equally spaced points on a com- 
mon functional domain, e.g., the interval [0, 1]. Thus, we modeled the log density 
ratio not by selecting any explicit functional form but with nonparametric Gaussian 
kernel-based estimates computed with the R package “densratio”, which is run with 
default parameter settings. 

Upon deriving the wildfire-specific curves Y;(t), i = 1...n, we conducted func- 
tional data analysis to identify outliers and test for potential association of this out- 
come with the “visibility” of the wildfire i to its nearest monitoring station with 
sufficient measurements. For testing association, the covariate x; used in the regres- 
sion model is the distance D; (as defined above) for wildfire 7. The model is given 
by 

Yi(t) = Bolt) + Bi()xi + a(t). (18.1) 
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Fig. 18.1 a Densities of PM2.5 for weekO (shown in green) and week! (shown in orange), and 
b Gaussian kernel estimated log density ratio curve of a selected fire 


We assume i.1.d. mean zero errors with finite variance. We examined two different 
fits for model (18.1) [26] (i) Point-wise Ordinary Least Squares: For a fixed t, model 
(18.1) is a simple linear regression model, so we fit the model separately for each t 
and combined the estimates. (ii) PACE Method: We began with functional principal 
component analysis on the outcome variables (using the “FPCA” function in the 
R package “fdapace”) which yielded a sequence of K basis functions ¢1,..., dx. 
Then, with Y(t) =n! )~_, Y;(t), we have (Fig. 18.1) 


K 1 
¥,(t) © Y(t) + 2 EiePe(t), sik = : (Yi (t) — Y(t) px (t)dt, (18.2) 


k=1 

K 1 

E(t) © Dende), ein = / E(t )pe(t)dt, (18.3) 
k=1 
K 

Bolt) © Do dorde(t), (18.4) 
k=1 
K 

Bilt) © > bude (2). (18.5) 
k=1 


The basis approximation provides a form of regularization since some of the 
noise in the original curves is filtered out. Sixteen outlier curves were removed. The 
non-outlier curves are shown in Fig. 18.2. Furthermore, these basis approximations 
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Fig. 18.2 Functional representation as non-outlier curves based on PM2.5 levels due to 57 fires 


yielded a collection of K simple linear regression models 
Ein = box + Digxi + ix, K= 1... K (18.6) 


which were fit using ordinary least squares. Here, based on the scree plot in Fig. 18.3, 
we selected the first K = 3 functional principal components that together explain 
more than 90% of the variation among the curves, and their corresponding scores 
were used in the above model. The results, in Fig. 18.4, show the estimates of the 
slope parameter /, (t) using both OLS and PACE methods. We note that larger values 
of D;, the distance between the wildfire i and its monitoring site, appears to have 
a negative effect on the functional outcome as values of t exceed 0.5 and reach the 
higher end of the PM2.5 range. The significance of this effect is assessed by the 
point-wise confidence bands for these estimates, assuming homoscedastic (across i, 
but not necessarily across ¢), normally distributed errors. 

Finally, for the purpose of exploratory data analysis, we plotted the scores of 
the first two functional principal components of all wildfire-specific curves as 2D 
points in Fig. 18.5. We note significant heterogeneity among these curves, which 
is particularly interesting since the wildfires could thus be grouped into at least 4 
clusters according to the variations in their PM2.5 density ratio “signatures”’. In fact, 
some of these unsupervised clusters appear to be associated with the geographical 
Zones of the wildfires that they contain. Given the role of the depicted components in 
capturing the variation among PM2.5 outcomes, this could lead to a more systematic 
analysis of wildfire dynamics in future work. 
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Fig. 18.3. Scree plot based on functional principal component analysis. The fraction of variance 
(FVA) explained (y-axis) by each principal component (x-axis) is shown. The cumulative FVA is 
shown (red curve), which for the first 3 components exceeds 90% (blue dashed line) 
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Fig. 18.4 The estimates for the slope parameter obtained from OLS (red curve) and PACE (blue 
curve) methods of model fitting. The corresponding confidence bands are shown 


18.5 Discussion 


In statistical inference, Rao noted that “wrong specification may lead to wrong infer- 
ence, which is sometimes called the third kind of error in statistical parlance. The 
problem of specification is not a simple one” [39]. The aim of this study, there- 
fore, is to draw the attention of students and researchers to the fascinating area of 
weighted distributions, which should gain relevance in the current age of data sci- 
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Fig. 18.5 Scatterplot of the first two functional PCA scores (x- and y-axes respectively) of the 
curves due to the wildfires. The Zone and the Distance to the monitoring site (in km) of each 
wildfire are depicted respectively by the color and the size of the corresponding point 


ence. Yet, as observed by Patil and Rao, “although the situations that involve weighted 
distributions seem to occur frequently in various fields, the underlying concept of 
weighted distributions as a major stochastic concept does not seem to have been 
widely recognized” [29]. As more datasets of value that seemingly lack rigorous 
design are being collected, and possibly shared through unconventional and occa- 
sionally biased sources, the need for working with such less-than-ideal yet practically 
useful observations will have to be addressed timely and systematically by statistical 
pedagogy. 

In this paper, we explored some aspects of the fascinating topic of weighted 
distributions with the purpose of introducing the students and researchers to the 
rich historical context, and the profound impact, of one of Rao’s key contributions. 
Interestingly, our use of functional data analysis (FDA) presents a further opportunity 
to note that Rao’s pioneering work on growth curves had, in fact, provided an early 
theoretical foundation for FDA. Motivated by a practical problem of analyzing the 
patterns of growth in a cohort of babies in Kolkata, Rao introduced a novel inferential 
framework [33] for analyzing growth curves for their multivariate structure. Further, 
he addressed many related issues in this area such as estimation, variable selection, 
prediction, and discriminant analysis. In his classic textbook “LSI” [36], Rao treated 
the latter problem in a more general form known as discrimination between composite 
hypotheses. It is, however, beyond the scope of the present study to elaborate on this 
yet another profound and path-breaking contribution of Rao in applied statistics [14]. 

Scientists addressing complex challenges such as monitoring climate change often 
face various data and metadata issues including visibility of rare phenomena, inspec- 
tion paradox, citizen science bias, hidden truncation, and selective reporting. As this 
paper shows, weighted distributions offer a historical perspective for the visionary, 
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inter-disciplinary nature of Rao’s early research, some of which appeared in his 1952 
title, “Advanced Statistical Methods in Biometric Research” [32]. The cover of this 
text (C. A. B. Smith: “one of the best textbooks in statistics available anywhere’’) 
contained a remarkably futuristic subtitle for its time: “A unified treatment of math- 
ematical distributions, statistical inference and diverse computational techniques”. 
Befittingly, decades later, Rao was honored by the establishment of an eponymous 
Advanced Institute of Mathematics, Statistics and Computer Science (where one of 
the present authors, S. P., was fortunate to have him as a colleague and collaborator). 
We take this opportunity to wish Prof. Rao a longer, healthy, and happy life. 


Acknowledgements The authors thank Ryan Stauffer for providing data for the study. Some parts 
of the paper were adapted from a lecture delivered by S. P. at the C. R. Rao Birth Centenary Session 
of the 22nd Annual Conference of the Society of Statistics, Computer and Applications, Pune, 2020. 


Appendix 


Fig. 18.6 C. R. Rao describing a formula on cranial capacity. For details, see Rao and Shaw [41], 
Rao [36] 
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19.1. Introduction 


Given a matrix M € @"*" andavectorg € &", the linear complementarity problem 
LC P(q, M) is to find a solution (w, z) that satisfies 


w—-Mz=q,w,z>0 (19.1) 

and 
w'z=0. (19.2) 

Let 
ye Ei | (19.3) 


where A € Z"-DXO-“) y veA"'andac Z&. 
In the LCP(g, M), if we take w = Ki k = 3 | andg = 2]. where @, Z and 
g < &'', we have 
w-AZ=G+v (19.4) 
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and 
Wn —Uu'Z = qn + 0a. (19.5) 


By applying Lemke’s algorithm to LCP(g + 6v, A), we get solutions to (19.4) for 
various values of obtained from the algorithm. Solution to LCP(q, M) is obtained 
when (19.5) is satisfied for some solution (w, z, 0) of (19.4). An algorithm based on 
the above procedure is presented in this article. The algorithm can be called Bordered 
Matrix Algorithm (BMA). Though BMA is essentially a variant of Lemke’s algo- 
rithm, an example is given to illustrate as to how it differs from Lemke’s algorithm. 

In the process of ascertaining the existence of solution to the LCP and develop- 
ing a method for obtaining a solution, if any, various classes of matrices have been 
identified. One such a class is the class of Q-matrices. If the matrix M is such that 
LCP(q, M) has a solution for every qg, then M is a Q-matrix. A basic problem in 
LCP is to characterize Q-matrices. In this paper, an algorithmic approach for char- 
acterization of Q-matrices is proposed by using BMA. The algorithm gives a better 
understanding of the problems involved in the Q-characterization by comparing M 
with its major principal sub-matrix A. 

BMA is based on the continuity property of solution to LCP when Lemke’s algo- 
rithm is applied to the LCP. The notations used in the paper, Lemke’s algorithm, 
and the continuity property of solution are explained in the Preliminaries section. 
BMA and its applications in Q-characterization are presented in the Main section. 
The utility of BMA and its prospects in Q-characterization are pointed out in the 
Conclusion. 


19.2 Preliminaries 


In a solution (w, z) to an LCP, w is a function of z and as such z itself is called 
a solution to the LCP. We denote the jth column of B ¢ #** as B. j- A matrix 
C € #** is called a complementary matrix if the C_; € {/,;, Bj} where J is the 
identity matrix of order k. Letk = {1,2,..., k} and for J C k, let J’ =k — J. Each 
C is uniquely defined bya J C k where C; = B,ifl € J andC, = 1, otherwise. The 
vectors J ; and B_; are complementary vectors. The notation C(J) is used to denote 
the complementary matrix indexed by J. B;x stands for the sub-matrix of B, whose 
rows and columns are obtained by deleting the rows and columns of B indexed by J’ 
and K’ respectively. B; 7 is denoted in short as B;. A complementary cone generated 
by a complementary matrix C denoted as pos(C) is the set of all non-negative 
linear combinations of the columns of C. The union of all complementary cones is 
denoted by K(B). pos(C) is a full cone if rank(C) is k. Two cones pos(C(J,)) and 
pos(C(Jz)) are adjacent if J; AJ2 = {i} for some i € J; U Jy. The cone generated 
by r complementary vectors is called an r-facet. A complementary cone pos(C) is 
degenerate if C is singular. When C is singular and if there exists a non-zero non- 
negative x such that Cx = 0, pos(C) is called a strongly degenerate cone. pos(C) 
is weakly degenerate if C is singular and pos(C) is not strongly degenerate. 
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19.2.1 Lemke’s Algorithm and Continuity of Solution z 


(2.1.1) The properties of Lemke’s algorithm as narrated here are based on the mate- 


rial that can be found in Cottle et al. [1]. Let Be Z**. If p = 0, the 
LCP(p, B) has a solution: w = p, z = 0. If at least one component of p 
is negative, the Lemke’s algorithm applied to LC P(p, B) with artificial 
vector v > O begins with a solution to LCP(p + Ov, B) where p + Oov => 0 
lies on a (k — 1)-facet of I for some 6) > 0. In the subsequent iterations, a 
sequence of 6;,i = 1, 2, ... is obtained where the algorithm fetches solution 
to LCP(p + 6;v, B) lying on a (k — 1)-facet common to the adjacent cones 
pos(C;) and pos(C;+4,). By adopting some standard procedure to avoid 
cycling, the algorithm terminates in a finite number, say s of iterations after 
passing through a unique sequence of {C;}, i = 0,1, 2,...,s where Co = J. 
Unless otherwise specified, hereafter, we shall take C, as the terminal cone 
of the algorithm, and Cp as the initial cone. The algorithm can start from a 
full cone pos (Co) instead of pos (J) if there exists a) > O such that p + 6v 
belongs to the interior of pos(Co) for 8 > 6. 


Proposition 19.1 Lemke’s algorithm applied to LCP(p, B) using artificial vector 
v > 0 has the following properties: 


The algorithm starts with a solution (w,z) to LCP(p+ 69v, B) where 0 = 
min{O: p+0v>O0}. (w= p+6v,z=0), 6 => O% gives a ray of solutions, 
called primary ray of solutions to LCP(p + 6v, B). 

The algorithm terminates in one of the following three ways. 


(T)) 


(72) 


(73) 


pos(C;) contains the vector p: 

If C; = C(J), the algorithm provides a solution to the LCP, viz.,z7 = (Co')y 
with zp = 0. 

(Ws—1, Zs—1) is a solution to LCP(p + 0;-1v, B) at the (s — 1)th iteration and 
C; = C(J) contains v: 

The algorithm fails to find a solution, but gets a solution to LCP(v, B), viz, 
zy =(Cy!v), with zy =0. The algorithm provides solution to LCP(p + 
Ov, B) for all p+ @v, 6 > 05-1. The set of such solutions (z; = (Co'p)s + 
6(Co!v)7, Zz = 0), 6 = Os-1 is called a secondary ray of solutions to the LCP. 
(Ws—1, Zs—1) is a solution to LCP(p + 0s_1v, B) at the (s — 1)th iteration and 
Cs; (J) is a strongly degenerate cone: 

The algorithm reaches a point p+ 6,v lying on a facet of pos(C,). There 
exists a non-negative vector x such that Cs(J)x = 0.Z; =x, withZy = Oisa 
solution to LCP(0, B). Zs-1 + yZ, y = 0 is a secondary ray of solutions given 
by the algorithm. 


In Lemke’s algorithm, solutions are obtained to the points p + Ov by moving 
through a sequence of adjacent cones C;,i = 0,1, 2,..., 5 in an interval of the 
line {p + Ov: 6 € [Onin, 0©O)} Where 6,,;, => 0 is the smallest possible 6 for which 


LCP(p + @v, B) has a solution in the implementation of the algorithm. In general, 
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the algorithm can start from any full cone pos(Co) other than pos (J), containing 
the artificial vector v in its interior. We see that the algorithm terminates with a 
solution at pos(C,) or terminates at pos(C,) which is either a strongly degenerate 
cone or a cone containing v. 


Time Parameter 

(2.1.2) Let the first iteration take place att = t9 = Oand the ith iteration at time ¢;_1. 
Let 6; = 6(t;). For ti <t < fi4, let us define 6(t) = 6; + [(O;44 = 0;)/(tiv1 = 
t;)\(¢ —t;),i =0,1,2,...,5 — 1. Ifthe algorithm finds a solution to LCP(p, B), 
then there exists at, say, t* > t;_; such that 6(t*) = 0. If C, isa strongly degenerate 
cone, then 6(t) = O(t,_,) forallt > t,_,. If C, contains v, define 0(t) = O(t;_1) + 
t —t,_; fort > t,_;. For t < 0, 0(t) = —t. The range of f is either (— ow, t*] or 
(—00, 00). O(t) is a continuous function of t. The range of 6 is [@nin, OO) for some 
Omnin ae 0. 

Let z(t) be the solution to LCP(p + 0(t)v, B). If C; is a strongly degenerate cone, 
Z(t) = z(ts_1) + (t — ts-1)Zs where Z, is a solution to LCP(O, B) given by C;. If 
C, contains v, z(t) = z(ts_1) + (t — ts_1)Z; where Z, is a solution to LCP(v, B) 
given by C,. For t < 0, z(t) = z(to) — t2 where Z is the solution to LCP(v, B) 
given by Co. z(t) is a one-one continuous function of t, which has been pointed 
out by Eagambaram and Mohan [2]. 

Let v; and uw; denote the vectors of v and u constituted by taking elements from J. 


Definition 19.1. For J Cn—1,M; = Ee | For A, non-singular, 
J 


S(M,/Aj;) =a—ujZAj'v, 


is the Schur complement of M , with respect to Ay. 


19.2.2. Some Q and Qo-Matrices 


A matrix B is called a Qo-matrix if K (B) is convex. If K(B) = #*, then B is called 
a Q-matrix. For every strictly positive vector p, if LCP(p, B) has a unique solution, 
then B is called a semi-monotone matrix or an Fo-matrix. If all the principal minors of 
B are positive, then B is called a P-matrix. The class of matrices having a particular 
property is denoted by the same symbol used to denote the property. 

A principal pivot transform (PPT) of B with respect to an index set J C k is the 
matrix —C(J)~'!C(J’) where C(J) and C(J’) are complementary matrices related 
to B, and C(J) is non-singular. If B is a Qo-matrix, a PPT of B is also a Qo-matrix. 
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19.3. Main Results 


19.3.1 Bordered Matrix Algorithm (BMA) 


Algorithm 1: Let 


where v > 0. Letg = [9, gn]’. Apply the parametric Lemke’s algorithm to LCP(g + 
Ov, A) with time parameter ¢. Let (W(t), z(t)) bea solution to the LCP(g + @(f)v, A). 

Let 
u(t) = wn(t) — u" Z(t) — a(t). (19.6) 


We note that if 6(t) > 0, w,(t) = 0. Since (W(t), Z(f)) is a continuous function 
of t, g,(t) is also a continuous function of t. The bordered matrix algorithm (BMA) 
aims to solve LCP(q, M) by using the solutions to LCP(q + 0(t)v, A). 


Step 1: g > 0. 

(1.1) If g, = 0, the solution is w = g, and z = 0. 

(1.2) If g, < O anda > 0, solution is W = ¢ + (—(Gn/a@)v), Wn = 0,7 = 0, 2) = 
—Gn/. 

Otherwise go to Step 2. 

Step 2: At least one component of q is negative. 

Carry out Lemke’s Algorithm on LCP(q, A) with artificial vector v. 

(2.1) In the first iteration 0 = 6. 

(2.1.1) Ifa > 0 and gq, < —@oa, find @ > @ such that g, = —@a. The solution is 
w= qt 0v, w, = 0,z = O0and z, = 0. 

(2.1.2) Ifa < Oandq, > —a6, let? = —q,/a.The solutionis w = q + O0v,w, = 
0,z=Oandz, = 0. 

Otherwise, proceed with the parametric LCP(g + @(t)v) and go to Step 3. 

Step 3: According to the time parametric version, let z(t) be the solution at time ¢ 
and q,(t) = wy(t) — u? z(t) — a(t) 

(3.1) At the ith iteration g,(t;) = —u™Z(t;) — w(t). 

(3.1.1) If q, lies between q,(t;) and g,(tj41), obtain t lying between ¢; and f+; 
such that g,(t) = g,. The solution is Z = z(t) and z, = 0@(f). 

(3.1.2) If t = t* so that 6(t*) = 0 at any stage of iteration, check q, (t*). If gy lies 
between q,(t;) and g,(t*) choose t such that g,(t) = qn. The solution is z = Z(t) 
and z, = O(t). 

If dn(t*) < dn, the solution is (w, z) is the solution to LCP(g, A). Wn» = dn — 
an(t"), and z, = 0. 

Otherwise, go to Step 4. 


Step 4: 


(4.1) The algorithm terminates at a strongly degenerate cone C, or the cone C, 
contains v. 
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(4.1.1) C, is strongly degenerate. z is solution to LCP(0, A). 
(4.1.1(a)) B= —u'Z > Oand gn > Gn(ts—1). Solution is Z(ts_1) + [Ign — Gn (ts—1)]/ 
Bz and z, = O(t;-1). 
4.1.1(b)) B = —u"Z < Oand qn < dn (ts—1). Solution is Z(t,—1) — [qn (ts—1) — dnl / 
Blz and z, = O(ts-1). 
(4.1.2) C, contains v. Z is solution to LCP(v, A) corresponding to C,. 
(4.1.2(a)) B =—u?’Z-—a>0 and Qn = Qn(ts—1). Solution is Z(ts—1) + [ln 
“+ Qn (ts—1)]/B1Z and ln = O(ts—1) oe [dn -_ Qn (ts—1)1/B- 
(4.1.2(b)) B = —u72 — a < Oand gn < gn(ts-1). Solution is Z(t,-1) — [4n(ts—1) — 
dn\/BlZ and 2, = O(ts—1) — [Gn (ts—1) — GnI/B.- 
Otherwise, the algorithm terminates without finding a solution. 


BMA is essentially based on Lemke’s algorithm and the continuity property of g, (t). 
Though the description of the algorithm is lengthy, it is easy to visualize and imple- 
ment by following the continuity property of g,(t). In BMA, solutions to g are 
obtained by fixing the first n — | components of q. The set of g’s for which solu- 
tions obtained are perpendicular to R”~!. The direction vector of the algorithm is 
the n-dimensional unit vector with the nth component unity. BMA is used in this 
article only as a tool to study the Q-properties of matrices. However, we give below 
an example of a matrix M and a vector g for which BMA solves LCP(q, M), but 
Lemke’s algorithm fails. 


Example 1 


Let 
1-1 1 
M=)-11 1 
2-1-1 


LCP(q, M) where g = [—2, —1, —2]” is not solvable by Lemke’s algorithm with a 
positive artificial vector. But, BMA solves it and gets the solution z = [3, 2.5, 1.5)". 


Let us see how BMA works with regard to the above example. The initial table 
with artificial vector v = [1, 1]” is 


Initial table with v as artificial vector 
x1]x2]x3] x4 | x5 |x6=—v|q 
0/0/-1} 1 —1 |-2 
1/0]; 1 /-1} -1 J-1 
0} 1-2] 1 1 |-2 | 


Solo} eR 


Lemke’s algorithm is applied to LCP(g, A) where g = [—2, —1]7. v= [1], 1]". 
u? = [2, —1] and a = —1. The most negative component of 7 is —2. The pivot 
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element is the first component of x6. After pivoting, the basic variables are wz and 
6. The entering variable is z; = x4. 


Iteration-0 
Basic variable|}x1 |x2]x3}x4 |x5 |x6|q Ratio 
0 —1/0 JO Jl J—1)1 |2 2/1=2 
w2 —1}1 }O {2 |-2)0 }1 1/2 
—ul > 0 fo {1 J—2]1 [1 lan.@ =2 


The solution at the first iteration is z} = 0, z2 = 0 and z3 = 6 = 2 and q, (0) = z * 
(—u,) + Z2 * (—u2) + O * (—a) = 0 x (—2) +O * (1) +2 * (1) = 2. x1 is driven 
out and x4 = z, enters. The ratios g;/(x4); for the rows | and 2 are computed 
and the minimum ratio is 1/2. The pivot element is the second component of 
x4 which is 2. In the next iteration, we get the following table. q,(1) = (—2) * 


Iteration-| 
Basic variable}x1 = |x2_—|x3)x4 |x5 |x6|q Ratio 
0 —1/2|}—1/2/0 |O |O |1 43/2 0 
Z1 —1/2}1/2 |1 JL J-1)0 {12 
—ul > 0 fo fi [-211 71 lgnd) = 1/2 


(1/2) +1*0+ 1% G/2) = 1/2. x2 = wy is driven out and x5 = z2 enters. The 
first and second components of x5 are non-positive. Lemke’s algorithm terminates 
for LCP(q, A) at the strongly degenerate cone C({1, 2}). (1, 1) is the non-trivial solu- 
tion to LCP(0, A). 6 = —u?Z = (-2)* 1+1* (1) =—-1<0.q,(1I) +2.5* B= 
1/2 + 2.5 x (—1) = —2. Therefore the solution is (1/2, 0, 3/2) + 2.5(1, 1,0) = 
(3, 2.5, 1.5). 

Let us continue to use M as a bordered matrix with A as the leading n — | principal 
sub-matrix. BMA can be applied even though the first n — 1 components of the last 
column of M are not positive, as seen from the following remarks. 


Remark 19.1 If one of the columns of M is positive except the diagonal element, 
then there exists a permutation matrix P such that PM P™ has its last column positive 
except the last diagonal element. BMA can be applied to PM P’ to get solutions to 
the original LCP with M. 


Remark 19.2 If v is in the interior of a complementary cone, the corresponding 
Principal Pivot Transform yields a bordered matrix with v positive. Then BMA can 
be applied to process the LCP. 
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19.3.2. Characterization of Q-Matrices Using BMA 


The following theorem gives a necessary condition for the matrix M to be a Q-matrix. 


w=[42] 


Then the following are the two equivalent necessary conditions for M to be a Q- 
matrix. 


Theorem 19.1 Let 


(i) Forg € Z""' there exists a 0 and a Cy € @(A) such that g + @v € pos(Co) 
for all @ > 0. 
(ii) _v is in the interior of K (A). 


Proof (i) Let ie ‘ |: p be a solution to LCP(M, gq) where q = 4 ae ,B> 
0. Then w — AZ=G+4+ (zn + B)v. As B > 00,9 + (Zn + B)v is in a particular full 
cone pos(Co), Co € @(A), for all 6 > Bo for some large positive Bo, because the 
complementary cones are finite in number. 

(i) => (ii) If v does not lie in the interior of K (A), then there exists ag and €o > 0 
such that v + €qg is not in K (A) for all € < €9. This leads to the contradiction that 
g + (/e)v is not in K (A) for all € < €. 

(ii) => (i) If vis in the interior of K (A), then for a given @, there exists €g such that 
v + eq is in the interior of K (A) for 0 < € < €. This implies that g + fv is in the 
interior of K(A) for B > 1/e€o. As there are only a finite number of complementary 
cones, we see that (i) holds. 


It follows from Theorem 19.1 that 


Corollary 19.1 [f M is a Q-matrix, then there exists a PPT M of M such that the 
last column of M is non-negative except the last element which can be of any sign. 


Example 2 
1-1 1 
Let M=|-—1 1 —1].Herev =[1, —1]’ is not in the interior of K(A) and so M 


abe 
is not a Q matrix for any a, b,c. 


Now we obtain some additional necessary conditions for M to be a Q-matrix as 
follows. 
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Theorem 19.2 M is a Q-matrix only if for a giveng € Z|, 


(i) there exists a full cone Cy) = C(J) € @(A) and 6 = 0 such that G + Ov is in 
pos(C(J)) for all 0 > 6 
and 

(ii) —u? 2) — a@ = —S(M,/AJ) < 0 where 2p is the solution to LCP(v, A) corre- 
sponding to C(J). 


Proof Necessity of (i) follows from Theorem 19.1. Additional necessity of (ii) is as 
follows: 

Let g = [g’, B]’. If M is a Q-matrix, then a fixed 7, LCP(q, M) has a solution 
for any 6, especially for a large negative f. 

We see that there exists a large 0) > 0, such that for 0 > 6, g + @v does not 
belong to any complementary cone that does not contain v. For such a 6, g + 6v 
with 6 > 6 belongs to each non-singular complementary cone containing v. 

For 6 > 69, consider solution z(@) to LCP(g + 0v, A) given by the complementary 
cone C, that contains v. 


Z)(0) = (Co's + (Cy v)y, 


and z,(@) = 0. Corresponding to the above solution, we get a solution [z(6)", 6]7 
to LCP(g, M) where 


B = —u"Z() — 0a = —u5 (Cy!) + O(a — u5Z,(0)). 
When @ = 0, LCP(g, M) has solution for allg with 6 = w, — u'Z(0), w, > 0. 
This ensures that 6 can take all values greater than —u7 z(0). If 6 has to take all large 
negative values, then 


a —u5z,(0) =a —u7(C)'v); = SMy/Az) > 0. 


Therefore, one of the non-singular complementary cones Co containing g + Ov 
for 9 > @ should satisfy Condition(ii). Hence the proof. 


Example 3 


A : : : 
Let M= Ei | where A is an Eo-matrix, v > 0, and a < 0. Then M is not a 


Q-matrix. _ 
Proof pos(1)= pos(Cg) is the only cone containing v. S(My/Ag) =a < 0. The 
necessary condition (ii) of Theorem 19.2 for M to be Q is not satisfied. 
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Example 4 


-11 1 
LetM =} 3 —2 1 |.Here,visin more than one cone of K (A). But none of such 
2 -1-3 
cones satisfies Condition (ii) of Theorem 19.2. Hence M is not a Q-matrix. For exam- 
ple, LCP(qg, M) with g = [1, 1, —10]7 has no solution. Here, as another example, 
we note that — M is not a Q-matrix if M is a P-matrix, because S(—M ,/(—A) ) < 0 
for all J C n — 1 when M is a P-matrix. 


The following theorem appearing in Sivakumar et al. [7] can be proved alterna- 
tively by using Theorem 19.2. 


Theorem 19.3 The matrix M of rank unity is a Q matrix if and only if each element 
of M is positive 


Proof Acomplementary cone C (J) related to A is non-singular implies that | /| < 1. 
If |J| = 1 and C(J/) is non-singular, then S(M ,/ Az) = (0). Therefore, the C(J/) that 
satisfies Conditions (i) and (ii) of Theorem 19.2 is Cy = J and the corresponding 
Schur complement is a which must be positive. Again, by Theorem 19.2, v is in pos(/) 
and so v is non-negative. Since any principal rearrangement is a Q-matrix, we see 
that all the diagonal entries are positive, and all other elements are non-negative. This 
is possible for a rank one matrix only if each element of M is positive. On the other 
hand, if each element of M is positive, it is known that M is a Py) N Q-matrix Katta 
[4]. Hence the result. 


The necessary conditions in Theorems 19.1 and 19.2 help us to rule out that M 
is a Q-matrix. The next result includes one more condition to bring out a set of 
conditions sufficient for M to be a Q-matrix. 


Theorem 19.4 M is a Q-matrix if for a giveng €« 2", 


(i) there exists a full cone pos(C(J)) € K(A) and 0) => 0 such that q + @v is in 
pos(C(J)) for all 0 = 6, 
(ii) —u? Zp — a < 0 where 2g is the solution to LCP(v, A) corresponding to C(J) 
and 
(iii) ifC, is the terminal cone in Lemke’s Algorithm to solve LCP(q, A) with artificial 
vector v, then 


(a) Cs contains q and Lemke’s algorithm terminates with a solution 
or 

(b) C, is a strongly degenerate cone, Z is the non-trivial solution to LCP(0, A) 
obtained from C, and —u™zZ > 0 
or 

(c) Cs contains v, Z is the solution to LCP(v, A) corresponding to C, and 
—u’Z—-a>0. 
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Proof Consider 
Gn(t) = Walt) — u" Z(t) — a(t) 


defined in the BMA applied to solve LCP(g, M). Conditions (1) and (ii) ensure that 
Gn(—00) = —00. g,(t) is a continuous function of ¢ and it attains oo under the 
conditions (iii) (b) or (iii) (c). Under (iii) (a) let the solution to LCP(q) be obtained 
at t = ¢*. Then w = [0, y]’, z = [Z(t*), OJ’ is a solution to [9, gn(t*) + y]’, for 
y > 0, which shows that the nth component of q takes all real values. Since g is 
arbitrary, we see that LCP(g, M) has solution for all g. 


The example below is an application of Theorem 19.4 to obtain a range of a for 
which M is a Q-matrix. 


Example 6 
-101 

M=) O —-1 1 | isa Q-matrix when a € (—1, 0) U (-3, —2). 
1 2a 


Proof Here,n = 3, A= —I,v=[1,1]",¢ =[n.@1",.¢ =[a1,.%, 431" and M is 
singular for a = —3 


Case 1: a € (—1,0) andq; < q2 
Start BMA from Cop = C({1}) which contains g+ Ov for all large 0. 
C({1}) is non-singular, which gives solution to LCP(v, A) as Z(v) = [1, 0]’. 

u?2(v) —a = —1 —a < 0. So, Conditions (i) and (ii) of Theorem 19.4 

are satisfied. Condition(iii)(a) of Theorem 19.4 is satisfied if g > 0. Other- 
wise, the next adjacent cone of Lemke’s algorithm is C, = 7. The solution to 
LCP(v, A) corresponding to C, = I is 2(v) = 0 and so —u"2(v) —a > 0, 
which satisfies Condition (iii)(c) of Theorem 19.4. Therefore, for the given 
q, LCP(q, M) has solution for all q3. 

Case 2: a € (—1,0) and gq; > q 
The proof is similar to Case 1. Here we have Cp = C({2}) and C, = J. 
—u'Z(v) — a < 0 with respect to Co and is positive with respect to C,. 

Case 3: a € (—3, —2) and q < q2 
We take Cyo=C(J), where J = {1,2}. The Schur complement 
S(M(J)/A(J)) > 0. If Z is in pos(C(J)) Condition (iii) is invoked and if 
q lies outside pos(C(J)), Condition (iii)(c) is invoked, to show that qg3 can 
take all values for a given q. 

Case 4: a € (—3, —2) and q; > q2 
This case is similar to Case 3. Here, again Co = C({1, 2}) andC, = C({1}) 
when g lies outside pos(C(J)). 
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Example 7 


M= Ei | is a Q-matrix if A is a Q-matrix anda > 0. 

Here, 0 is in the interior of K (A). If z is a solution to LCP(A, q), gq, can take values 
exceeding —u’ z by increasing w,,; and can take values less than —u? z by increasing 
6. Therefore, M is a Q-matrix. 


Example 8 


A . : 
Let M = La | , Where A is an Eo-matrix, a > 0 and u,v > 0. It follows from 
Theorem 19.4 that M is a Q matrix. It can also be seen that M is an Eo matrix as 
well. A particular case is that A = 0. 


Example 9 


It follows by induction, from the previous example that a matrix M where M;; > 0 
for j > i and Mj; < 0 for j <i. Then M is an Ey NM Q-matrix. 


Example 10 


Let M = ir | where A is a Q matrix processable by Lemke’s algorithm with 
artificial vector v > 0 and a > 0. Applying BMA from pos(/), we see that M is a 
Q-matrix. 


Whether A is a Q-matrix or not is relevant in examining the Q-property of M. We 
give below a necessary condition for M to be a Q-matrix when A is not a Q-matrix. 


Theorem 19.5 Let A be not a Q-matrix. Then M is a Q-matrix only if for ¢ not in 
K (A), there exists a@ > 0 such that q + 0v is in a cone pos(C). 
Either 


(i) pos(C) is a strongly degenerate cone, the solution z to LCP(0, A) given by C 
satisfies —u™z > 0 
or 

(ii) pos(C) is a full cone containing v and the solution Z to LCP(v, A) given by C 
satisfies —u'2—a > 0. 


Proof If M is a Q-matrix, then for a given g not in K (A), LCP(M, qg) where g = 
[g, 6]” has solution for all real 8. Since LCP(A, @) has no solution, for every 6 > 0 
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there is a 9 > 0 such that LCP(A, g + 6v) has a solution Z and 6(9) = —u7Z(6) — 
a@ = 6. In order that there exists a 5(@) as large positive as desired, g + Ov lies ina 
cone pos(C) , for some 6 > 0 such that either (i) pos(C) is a strongly degenerate with 
—u'Z > 0 where Z is a non-trivial solution to LCP(A, 0); or (ii) pos(C) contains v 
and —u’2Z — a > 0, where 2 is a solution to LCP(A, v) corresponding to pos(C). 
Hence the proof. 


Theorem 19.5 gives a necessary condition which implies that if 
A is not a Q-matrix, there must be a strongly degenerate cone or a cone containing uv 
satisfying conditions of the theorem in order that M is a Q-matrix. So, as a corollary, 
we have the result given by Jeter and Pye [3]: 


Corollary 19.2 Jf A is not a Q-matrix and M is a Q-matrix, then there exists a 
solution z to LCP(q, M) with q = [0,0,..., 1]", such that [u™, a]z < 0. 


19.4 Conclusion 


BMA is a variant of Lemke’s algorithm and is based on the continuity property of 
solution to the parametric LCP. The algorithm can be carried out when one of the 
columns of M is positive except the diagonal element. As seen from Example 1, 
there are matrices processable by BMA but not by Lemke’s algorithm. The classes 
of matrices processable by BMA are yet to be explored. 

The motivation for this article is to characterize Q-matrices by studying the rela- 
tionships between an n x n matrix M and its (n — 1) x (n — 1) leading principal 
sub-matrix A in terms of LCP properties. Cottle et al. [1] presents an augmented 
matrix approach to solve LCP(qg, A) by means of solution to LCP(q, M). Jeter and 
Pye [3] had done such a study and obtained some necessary conditions on Q-property 
involving M and A. Projesh et al. [6] characterized M as a Q-matrix where A is of 
a particular type. Here, BMA gives an algorithmic approach for such studies. Some 
necessary conditions for a matrix to be a Q-matrix are obtained using BMA. The 
utility of BMA in identifying Q-matrices is illustrated by examples. 

Corollary 19.2 in the case of Qg-matrices is proved by Murthy and Parthasarathy 
[5]. One may try to obtain further results on necessary conditions for Qo-properties of 
matrices on the lines of the theorems for Q-matrices given in this paper, which would 
help in the characterization of Qo-matrices in general. The study of completely- Qo 
and fully Qo- matrices is another area where the algorithmic approach given in this 
paper can be useful. The invariance of Q-property under principal rearrangement 
and PPT is an added advantage in the application of BMA. 


Acknowledgements The author is thankful to Professor T. Parthasarathy and Professor K. C. 
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20.1 Introduction 


Descending endomorphisms of groups were defined in [4] and used for developing an 
alternative proof of a result earlier proved by Miller in [5], characterising the normal 
subgroups of direct products. A descending endomorphism is a group endomorphism 
that descends to every quotient group. Equivalently, it is an endomorphism that 
leaves every normal subgroup invariant, which immediately implies that all power 
endomorphisms [2] and inner automorphisms are descending endomorphisms. This 
also implies that in any Dedekind group (in particular, any Abelian group), the 
descending endomorphisms are exactly the power endomorphisms. We are interested 
in the question: how can we determine the descending endomorphisms of non-Abelian 
groups? In [4], itis shown that every descending endomorphism of a direct product of 
finitely many groups is a pointwise product of direct products of the direct factors, and 
a condition is provided for determining exactly which of the pointwise products define 
descending endomorphisms of the product group. In this paper, we prove a general 
result showing that all endomorphisms of a certain class of groups, which includes 
all symmetric groups, are descending. We then apply this together with the result on 
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descending endomorphisms of direct products in order to determine all descending 
endomorphisms of direct products of finitely many symmetric groups. Next, we give 
an explicit description and enumeration of all descending endomorphisms of dihedral 
groups and direct products of finitely many dihedral groups. In the final two sections, 
we determine the descending endomorphisms of Hamiltonian groups and dicyclic 
groups. 

We begin by reviewing the definition and some of the basic results that will be 
used in proving the main results of this paper. 


Definition 20.1 [4, Definition 2.1] An endomorphism 6 of a group G is a descending 
endomorphism if, for every epimorphism g: G —» Q of G onto a group Q, there 
exists an endomorphism 6 of Q such that the diagram in Eq. (20.1) commutes: 


3G (20.1) 


Then 65 is unique and is said to be the descended endomorphism or descent of 8 to 
Q along ¢, or the endomorphism induced by 5. 


By the first isomorphism theorem, epimorphic images of a group G are isomorphic 
to quotients of G. This leads us to the following characterisation of descending 
endomorphisms in terms of normal subgroups. We write H < G (respectively, H < 
G) to mean that H is a subgroup (respectively, normal subgroup) of G. We call a 
subgroup H of a group G invariant under an endomorphism ¢ of G if e(H) < H. 


Lemma 20.1 [4, Lemma 2.4] An endomorphism 6 of a group G is a descending 
endomorphism of G if and only if for every normal subgroup N of G, 5(N) < N. 
Then the descent of 5 to any quotient G/N along the canonical projection is defined 
by &(xN) = 5(x)N for allxN € G/N. 


Proof Consider a group Q onto which there is an epimorphism from G. Then by 
the first isomorphism theorem, we can identify Q with a quotient G/N of G where 
N <1G, with the epimorphism being replaced by the canonical projection g: G —» 
G/N. Now, let 5 be an endomorphism of G. Then by (20.1), 5 is a descending 
endomorphism of G if and only if G/N has an endomorphism 6 such that 5 o g = 
go 6. But g(x) = XN forall x € G. Thus, 6 is descending if and only if G/N has an 
endomorphism 6 defined by 5(xN) = 6(x)N for all x € G. Observe that 6 is well- 
defined if and only if xN = yN implies 6(x)N = 6(y)N for all x, y € N. However 
this is equivalent to the condition that x € N implies 6(x) € N for all x € N, ie., 
5(N) <N. 
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Recall that a power endomorphism, defined as an endomorphism that leaves all 
subgroups invariant, can be characterised as an endomorphism that maps every ele- 
ment to some power of itself. From Lemma 20.1, we obtain an analogous charac- 
terisation of descending endomorphisms. For an element x of a group G, let (x°) 
denote the normal closure of x in G, i.e., the intersection of all normal subgroups of 
G that contain x. 


Corollary 20.1 An endomorphism 6 of a group G is descending if and only if for 
every element x € G, 8(x) € (x®). 


Proof If 5 is a descending endomorphism of G, then (x) <I G is left invariant by 
6, and hence 5(x) € (x). Conversely, suppose 5 is an endomorphism of G such 
that 5(x) € (x®) for all x € G, and let N be any normal subgroup of G. Then for 
all x € N, (x°) < N and hence 5(x) € N, which implies that 5(N) < N. Thus, 4 is 
descending. 


Corollary 20.1 provides a general answer to the question posed in Sect.20.1 
as to how to determine the descending endomorphisms of non-Abelian groups. It 
proves to be especially useful in obtaining the automorphisms of groups defined by 
presentations, as will be seen in Sects. 20.3 and 20.5. 

Using the characterisation of descending endomorphisms given in Lemma 20.1, 
we can show that the property of being a descending endomorphism is inherited by 
the descent as well. The proof given here is different from that in [4] where this result 
originally appears. 


Corollary 20.2 [4, Theorem 2.2] Let G be a group, and let p: G —» Q be an epi- 
morphism. If 5 is a descending endomorphism of G, then its descent § to Q is a 
descending endomorphism of Q. 


Proof As in the proof of Lemma 20.1, we identify Q with a quotient G/N for some 
N <1G, and g with the canonical projection from G to G/N. Now, we show that 5 
leaves all normal subgroups of G/N invariant, which would imply, by Lemma 20.1, 
that 5 is a descending endomorphism of G/N. By the correspondence theorem, every 
normal subgroup of G/N is of the form M/N for some normal subgroup M of G such 
that N < M. To see that 5(M/N) < M/N, consider any xN € M/N with x ¢ M 
(without loss of generality) and observe that 5(xN) = 5(x)N. Since M <I G is left 
invariant by 6, 5(x) € M, which implies that 6(x)N € M/N. 


The set of all descending endomorphisms of G, denoted by Des G, is a sub- 
monoid of the endomorphism monoid End G. The identity automorphism and the 
trivial endomorphism that map every element to the identity element are descending 
endomorphisms of any group. Lemma 20.1 shows that all inner automorphisms and 
power endomorphisms are descending endomorphisms. Power endomorphisms are 
studied in [2, 3, 6]. If A is an Abelian group then all its subgroups are normal, and 
hence Des A is the set of all power endomorphisms of A. A power endomorphism ¢ 
of A is universal if there exists an integer k such that for all x € A, e(x) = x*, The 
following result proved in [4] will be used here. 


412 V. Madhusudanan et al. 


Theorem 20.1 [4, Theorem 2.5] Suppose that A is an Abelian group satisfying any 
one of the following properties: 


(i) A is a direct product of cyclic groups. 
(ii) A has finite exponent. 
(iii) A has an element of infinite order. 


Then the descending endomorphisms of A are exactly its universal power endomor- 
phisms. 


A non-Abelian group may have descending endomorphisms that are neither power 
endomorphisms nor inner automorphisms, as shown in the following example. 


Example 


Consider the symmetric group $3 generated by the permutations r = (123) and s = 
(12). Let 5 be the endomorphism of $3 that maps r to | and fixes s. Then 6 is neither 
an inner automorphism nor a power endomorphism. The only proper, non-trivial 
normal subgroup of $3 is (7), and 6 ((r)) = 1 < (r). Hence, 6 € Des $3. 


Now we define some notation that will be followed throughout the rest of the 
paper. Let G = TT, G; be the direct product of m groups G1,..., Gj». Let 2; be 
the canonical projection G onto G;. For 5 € Des G, the restriction of 5 to G;, denoted 
by d|g,, is a descending endomorphism of G;. 


Definition 20.2 If 5; € DesG;,i = 1, ...,m, then their pointwise product, denoted 
by 5 = [7 6; is the endomorphism of G = []/_, G; defined by 5(x1,..., Xm) = 
(6; (x1), ae) bm(Xm)), for all (x1, sey Xm) eG. 


Let N be a fixed normal subgroup of G, and let N; = 7; (NV) be its projection into 
G;. Then Tj IGi,NJ<N< Tes N;, with all three subgroups normal in G. 

If H < G is such that []/"_,[G;, Ni] < H, then with respect to N, H denotes 
the unique subgroup of G/([]/_,[Gi, Nil) pase to H under the bijection 
defined by the correspondence theorem, i.e., H = H/([]j_,[G;, N;]). In particular, 
G =G/(TTj1G;, Ni). _ 

Note that H <G if and only if H IG. Since N; x N> itself is central in G, 
every subgroup H of G lying between []j_ all [G;, Ni] and []}_, Ni is normal in 
G. We identify G with the product [T/L Gi via their canonical isomorphism, 
where G; = G;/[G;, N;]. Let 6; be a fixed descending endomorphism of G;. Then 
the pointwise product tt, 6; is an endomorphism of G, but is not necessar- 
ily descending. However, 4; descends to 5; € Des G; along the canonical epimor- 
phism from G; to G;, and []/_, 5; € EndG. Thus, []j, 6; maps each element 
(x1. tn) TT LGi, Nil) € E to (1001), «bn (im) TT 1LGi, Nil). The fol- 
lowing results appearing in [4] provide a complete description of the descending 
endomorphisms of direct products. 
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Theorem 20.2 [4, Theorem 3.9] Let G = The G; be the direct product of k groups 


G,,..., Gy. Let 6; € DesG;, i = 1,...,k, and let6 = Th, 6; denote their point- 
wise product. Then 6 is a descending endomorphism of G if and only if for all 
N; (Gj,i = 1,...,k, the restriction 5|x7 is a descending endomorphism of N, where 


N =J[L, (Mi/(Gi, Ni) and 8 = TY, 5. 


i=l 
Theorem 20.3 [4, Theorem 3.10] Let A = Te A; be the direct product of k torsion 
Abelian groups Aj,..., Ax. If 5; € Des Aj, i = 1,...,k, then 


(i) for each a; € A; there exists m; € Z such that 6;(a;) = a i=1,...,k, and 
(11) the pointwise product 5, x --- x dx is a descending endomorphism of A if and 
only iffor every pairi, j € {1,...,k}, andforalla; € Aj anda; € Aj,m; =m; 


mod ged (|a;|, |a;\). 


Remark 20.1 In Theorem 20.3, whenever A; has finite exponent n;, we may replace 
|a;| in (ii) by n;, thus making the condition independent of individual elements 
ay E Aj. 


20.2 Descending Endomorphisms of Symmetric Groups 


In this section, we will first show that all endomorphisms of S, are descending, 
and then determine the descending endomorphisms of direct products of symmetric 
groups. We begin by recalling some elementary facts about symmetric groups. Let S, 
be the symmetric group, considered as the group of all permutations of the set X = 
{1,2,...,}, where n > 5. Let A, be the alternating group, which is the subgroup 
of S,, consisting of all even permutations in S,. Then A,, is the unique proper, non- 
trivial normal subgroup of S,, has index 2 in S,, is non-Abelian, and [S,, A,] = 
[Sn, Sn] = An. 

S, is centreless. For n 6, all automorphisms of S,, are inner automorphisms, 
and Aut S, = S,. For n = 6, the inner automorphism group has index 2 in Aut S¢, 
which is isomorphic to a semidirect product of S¢ and C2. Here, C2 denotes the cyclic 
group {1, —1} under multiplication. 

Consider any endomorphism 6 of S,, that is neither trivial nor an automorphism. 
Then ker 5 = A,. Let s € S, be the permutation (12) that interchanges | and 2 and 
fixes all other elements of X. Then 6 is uniquely determined by its action on s, as 
|S, : An| = 2. Moreover, any mapping of s to an element of order 2 defines such 
an endomorphism 64, since we may regard 6 as the composition of a homomorphism 
from S$, to Cz sending s to —1 € C2, with an embedding of C2 into S, sending —1 to 
the desired element of order 2. Thus, the non-trivial, non-bijective endomorphisms 
of S,, are in one-to-one correspondence with the elements of S,, that have order 2. 

First, we establish a result from which it will follow that all endomorphisms of 
S, are descending. 


Theorem 20.4 [f the normal subgroups of a finite group G form a chain under 
inclusion, then all endomorphisms of G are descending. 
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Proof Since G is finite, its chain of normal subgroups has a finite length, say m. The 
result is obvious form = 1, 2. Assume for the sake of induction that the result holds 
for a group whenever its normal subgroups form a chain of length less than m. 

Let 6 be any endomorphism of G. If 5 is injective, then it is an automorphism of 
G, and hence maps each normal subgroup of G to a normal subgroup of the same 
order. However, since the normal subgroups form a chain, no two distinct normal 
subgroups can have the same order, and, therefore, 5 maps each to itself. 

If 5 is not injective, let K = ker 6, and consider G/K. The canonical projection 
from G to G/K defines a bijection between the normal subgroups of G containing 
K and the normal subgroups of G/K, and, therefore, the normal subgroups of G/K 
form a chain of length less than m (since K > 1). 

Define 5: G/K — G/K given by 5(gK) = 5(g)K for all g € G. This map is a 
well-defined endomorphism of G/K, since K = ker 6. 

Now let N be any normal subgroup of G. Then either N < K or K < N. If 
N < K,thend(N) = 1 < N. Otherwise, N/K is anormal subgroup of G/K and by 
the induction hypothesis, 5(N/K) < N/K. Thus, forallx € N, 5(xK) = 8(x)K = 
yK for some y € N, whichimplies that y~'5(x) € K < N,and therefore, 5(x) € N. 
Thus, 6(N) < N in this case as well. 

Since 6(N) < N forall N <I G, it follows from Lemma 20.1 that 5 is a descending 
endomorphism of G. 


As observed earlier, S,, has a unique proper, non-trivial normal subgroup, A,, and 
therefore the normal subgroups of S,, form the chain 1 < A, < S,. Hence, Theorem 
20.4 applies to S,, and Des S, = End S,,. Now, any descending endomorphism of a 
direct product of symmetric groups is a pointwise product of endomorphisms of the 
direct factors. The following result characterises all such pointwise products. 


Theorem 20.5 Let G = []‘_, G; where G; = S,, andn; > 5,i =1,...,k. Lets; € 
G; be the permutation (12), and let Mj = Ay, denote the alternating subgroup of Gj. 
If 5; € End G; = Des G;, i = 1,..., k, then the pointwise product 6 = Te, 6; isa 
descending endomorphism of G if and only if either 5;(s;) € Mj foralli € {1,...,k}, 
or 6;(8;) € M; for alli € {1,..., k}. 


Proof Let 6; € EndG;, i=1,...,k, and let 6= Th &- Since G; = S,,, 5; € 
Des G;, by Theorem 20.4. We shall use Theorem 20.2 to establish the result. For 
this purpose, we consider all possible normal subgroups N; of G;,i = 1,...,k, and 
apply the condition in Theorem 20.2 to them. Let N = Th 1 (N;/[G:, Ni]). 

Recall that each N; JG; must be 1, M; =Aj,,, or G; = S,,. Let N; = 
N;/[G;, Ni], and let 5; be the descent of 5; to N;, which is a finite Abelian group. 
By Theorem 20.1, foreach i = 1,...,k, there exists an integer m; such that for all 
aye Ni, 6;(a;) = a; 

If N; = 1 or Mj, then [G;, N;] = N;, so N; = 1. In this case, the integer m; 
mentioned in the preceding paragraph can be taken to be any integer, and therefore 
we may ignore this case, as it is independent of the condition in Theorem 20.3, (ii). 

If N; = G;, then[G;, N;] = M;,soN; = Sn; / An; has order 2. Now, if 5;(s;) € Mi, 
then 5; is the trivial endomorphism, and therefore m; = 0 mod 2. If 6;(s;) ¢ M;, 
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then 4; is the identity automorphism of N;, and therefore m; = 1 mod 2. Now, from 
Theorem 20.3 and Remark 20.1, 5 is a descending endomorphism of G if and only if 
m; =m; mod 2foralli, j = 1,...,k, and according to our preceding observation, 
this is true if and only if 5; maps s; to M; either for all i or for no 7. This concludes 
the proof. 


20.3 Descending Endomorphisms of Dihedral Groups 


Consider the dihedral group D,, of order 2n as defined by the following presentation: 
D, = (7,8 |r" =s*=1, srs =r"). 


Then D,, is a semidirect product of its subgroups (r) <1 D, and (s), of orders n and 
2, respectively. Thus, every element of D,, can be written uniquely in the form r's/ 
for somei € {0,...,n — 1} and j € {0, 1}. 


Remark 20.2 Any endomorphism ¢ of D,, can be defined by specifying its action 
on the generators r and s, and extending the action to all elements r“s” as e(r“s”) = 
(e(r))“(e(s))”. Such a map is well-defined if it is consistent with the relations in the 
presentation of D,, i.e., if (e(7))” = (e(s))? = land e(s)e(r)e(s) = (e(r))~!. Then 
it is an endomorphism by definition. 


We will show that the descending endomorphisms of D,, can be classified into two 
types of endomorphisms, each defined by two parameters. The possible values of the 
parameters for which these endomorphisms are descending depend on whether n is 
odd, singly even, or doubly even. 


Definition 20.3 For every pair i, j € {0,..., — 1}, define maps 5¥) and 62:7) 
on D,, by their actions on the generators r and s, as follows: 


(i) 6:7 pe ri, sre rss, 
2j 


(ii) 6A) prs ris ph rs, 
When 5“) and 5) are well-defined endomorphisms of D,,, we shall call them 
endomorphisms of the first kind and second kind, respectively. 


We shall first determine the values of the parameters i and j for which these are 
well-defined maps on D,,, and hence endomorphisms. 
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Lemma 20.2 Let G = Dy, and let 6“) and 5°) be the maps on G as given in 
Definition 20.3 for i, j € {0,...,n — 1}. Then the following hold: 
(i) For each n, 5") is well-defined for all values of i and j. 

(ii) Ifn is odd, 5°") is well-defined if and only if i = j = 0. 

(iii) Ifn =2 mod 4, then 5°") is well-defined if and only if i € {0, 3} and j € 
{0, 5}. ; 

(iv) Ifn =0 mod 4, then 6?) is well-defined if and only if i € {0, 4} and j € 

non 3n 

{ war 2? rar 

Proof Note that 5, whether it is of the first kind or the second kind, maps r to r‘, 

which implies that (6(r))” = 1. Thus, it is sufficient to check the other two conditions 


given in Remark 20.2. 

Firstly, if 5=8), then (8(s))2=(r2/s)2=1 and 8(s)8(r)8(s) = 
(r7/s)ri(r2/s) =r! = (8(r))~!, for all i and j. Hence, 6“%/) is always well- 
defined. 


Now let 5 = 5@), Then (5(s))? = r4/ = 1 if and only if 4j =0 mod n, and 
5(s)5(r)6(s) = r’, which will be equal to 6(r)~! = r~' if and only if2i = 0 mod n. 
When n is odd, these two congruences hold only when i = j = 0. When n is 
even, these imply that i € {0, 5}, and j € {0, a or jé€ {0, Ss an) depending 
on whether n = 2 mod 4orn =0 mod 4, respectively. 


Now, to determine the values of i and j for which 6“) and 5?) are descending 
endomorphisms of D,, we use the fact that an endomorphism is descending if and 
only if it maps every element to some element in its normal closure (Corollary 20.1). 
Observe that in D,,, the normal closure of an arbitrary element of the form r* is the 
cyclic subgroup (r*), and the normal closure of an arbitrary element of the form 
r*s is (r?, r*s), which can be written in the simpler form (r?, r's), where t € {0, 1} 
and t =k mod 2. In particular, the normal closures of r and s are (r) and (r?, s), 
respectively. Note that when n is odd, then for all k, (r?, r*s) is D,, itself. 


Theorem 20.6 The descending endomorphisms of D, are completely described and 
enumerated by the following: 


(i) Ifn =1 mod 2, then 


Des D, = {au 


Osien-1,02jen-1}ufer| 


and | Des D,| =n? +1. 
(ii) [fn =2 mod 4, then 


Des D, = pin 


Pease ate os j<5-1fu[sem) 


ne 
and | Des D,| = a +1. 
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(iii) Jfn =O mod 4, then 
Des D, = | 6h) 


| gi) 


We 
and | Des D,,| = qa 


fe(3...n—0sjsp-tfu 


ie fo.3}.7<fo.3]] 


Proof We begin by noting that any descending endomorphism of D,, must neces- 
sarily map r to r' for some i € {0,..., — 1}, and s to either r7/s or r7/ for some 
J €{0,...,n — 1}. This is evident from the structure of the normal closures of r and 
s. It is also clear that when n is odd, r?/ has n distinct values for jJ=0,...,.n-1, 
whereas when n is even, r7/ has exactly 5 distinct values, which are obtained by 
taking j = 0,..., 5 — |. Therefore, to determine Des D,,, it is sufficient to consider 
only maps of the form 5“ and 6°“) for the values of i and j in the ranges specified. 

Now, by considering the normal closure of an arbitrary element of D,,, we show 
that the values of i and j for which 5“) and 6@“) are descending, in each of the 
cases given in (i), (ii), and (iii), are indeed as claimed. Both kinds of endomorphisms 
define power maps on the subgroup (r), and hence map its elements to their normal 
closures. Thus, we need to consider only an arbitrary element of the form rs. Recall 
that the normal closure of rs can be written as (r?, r’s), where t € {0, 1} andt =k 
mod 2. 

First, consider 6 = 5"), applied to the element rs, which gives 5(r)6(s) = 
rir-is = r'+Js, If 6 is descending, the values of i and j should be such that 
rit2/5 € (r?, rs), the normal closure of rs. If n is odd, then (r?, rs) = D,, and there- 
fore i and j can take any value. If n is even, then r'+?/s € (r?, rs) only if i +2 is 
odd, i.e., i is odd. To see that this is sufficient to make 6 descending, observe that 
b(r*s) = r'*+Js © (r?,r's), where t € {0,1} andt =k mod 2, if and only if i is 
odd. This proves that the values of i and j must be as given, in each case, for 6 of 
the first kind. 

Next, consider 6 = 8°), which when applied to rs gives 6(r)d(s) = rip = 
r'+2J From Lemma 20.2, we know that 6 is trivial ifn is odd. This completes the proof 
of (i). Now let n be even. Recall that i € {0, 5}, and j € {0, z] orj € { van as a} 
accordingly as n = 2 mod 4 orn =O mod 4. Now r'+/ belongs to the normal 
closure of rs, namely (r?, rs), only if i +27 is even, and hence i is even. In this 
case, 5(r*s) = r'*+?/, which belongs to (r7, r‘s), wheret € {0, l}andt =k mod 2. 
Thus, 6 is descending if and only if i is even. If =2 mod 4, then 5 is odd, and 
this forces i to be 0, which proves (11). If =O mod 4, then both z and 0 are even, 
and hence i can assume both these values. 

As noted, the respective ranges given fori and j define distinct descending endo- 
morphisms of D,,, which immediately lead to the enumerations of | Des D,,| given 
in (1), (ii), and (iii). 
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We can now use results on direct products to compute the descending endomor- 
phisms of all finite direct products of dihedral groups. In order to apply Theorem 
20.2, we need to determine, for every N <I D,,, the quotient N=N /[Dn, N] as well 
as, for every 5 € Des Dy, the restriction Sly of the descent 5 € Des(D,, /[Dn, NJ). 
Then, in the direct product [];_, Dn,, for every normal subgroup N; of each factor 
Dy, and every 6, € Des Dy,, we may use the results of this general computation 
in conjunction with Theorem 20.2, to identify all the pointwise products of 5, that 
define descending endomorphisms of the direct product. 

Let 1 denote the trivial group, C2 the cyclic group of order 2, and C [ = Cz x C2 
the Klein 4-group. 


Lemma 20.3 If G = D, and N AG, then N/[G, N] = 1, Co, or G. 
If N/(G, N] = Ca, then one of the following holds: 


(i) N < (r), nis even, and[G, N] = (x? |x € N). 
Gi) N= ie. Ss) or tr, rs), nis even, and[G, N] = (r). 
(iii) N = G and n is odd, and[G, N] = (r). 


If N/(G, N] = C3, then n is even, N = G, and [G, N] = (r’). 


Proof Observe that the commutator of any two elements rs’ and r/s", ice., 
[r's', r/s“], is an element of the form r7*'+?4/, where a and 8 may be 0 or 1 depend- 
ing on the values of t and u. Hence, for any N 1 G,[G, N] < (r?). Now we consider 
all non-trivial normal subgroups NV. 

If N < (r), then the general form of the commutator shows that [G, NV] = (x 
x € N). Clearly, in this case, |N : [G, N]| is 2 ifn is even and | ifn is odd. 

If N = (r?,s) or (r?,rs), then [G,N] = (r?), since [r,s] =[,rs] =r? € 
[G, N] < (r?). In this case, if n is even, then (r?) < N < Gand |N :[G, N]| = 2, 
whereas if n is odd, then N = G and (r?) = (r), which again implies that |N : 
[G, N]| = 2. 

The only remaining case is N = G withn even. In this case, [G, N] = (r?) < (r), 
by the same argument as in the previous case, and then N/[G, N] is the quotient 
group isomorphic to Ce generated by s[G, N] = s(r?) andrs[G, N] =rs(r’). 


| 


Let G = D,, N 1G, and 6 € Des D,,. Following the notation of Theorem 20.2, 
5 denotes the descent of 6 to G=G /LG, N], and Sly denotes the restriction of 6 
to the subgroup N = N/[G, N]. N is central in G and hence 5|y € Des N. From 
Lemma 20.3, we know that W is 1, Co, or Ce. Thus, Sly € Des N must be either 
the trivial or the identity endomorphism. Observing that 5], is trivial if and only if 
6(N) < [G, N], we obtain the following corollary to Lemma 20.3. 


Corollary 20.3 Let G = D,, 6 = 8“) € DesG, for some i and j, andt € {1, 2}. 
Let NG. 

(i) IfN < (r), then |x is trivial if and only if i is even. 

(ii) If N = (r,s), (r?, rs), or G, then 5|x ts trivial if and only if t = 2. 
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Using the cases listed in Theorem 20.6, we can now show that |; is the identity 
or trivial endomorphism accordingly as 6 is of the first or second kind. 

First, suppose that 6 is of the first kind. If N is not contained in (r), then by 
Corollary 20.3, 5|; is the identity endomorphism of N. If N < (r), then we have 
two cases depending on the value of n. If n is odd, then N = 1, and hence dx is 
the identity endomorphism of the trivial group. If n is even, then by Theorem 20.6, 
6 = 61) where i is odd, and hence by Corollary 20.3, 5|;7 is again the identity 
endomorphism. 

Suppose on the other hand that 6 is of the second kind. Then by reasoning along 
the same lines, we see that oy must be trivial in all cases. Note that when n is odd 
and N < (r), N = 1 as in the previous case, but the identity endomorphism of the 
trivial group is the same as its trivial endomorphism. Thus, we obtain the following 
second corollary to Theorem 20.6 and Lemma 20.3. 


Corollary 20.4 Let G = D, andé € DesG. Then for any N SG, dy is the identity 
endomorphism of N if 6 is of the first kind, and is the trivial endomorphism of N if 
6 is of the second kind. 


Applying Corollary 20.4, we prove that the descending endomorphisms of any 
finite direct product of dihedral groups are the pointwise products of descending 
endomorphisms of the same kind. 


Theorem 20.7 Let G=[[jL, Ge, where Ge = Dy, i=1,...,m. Then the 
descending endomorphisms of G are all the pointwise products of the form | [j_, 5k 
where each 5; € Des Gx and the d, are all of the same kind. 


Proof We know, from Theorem 20.2, that every descending endomorphism 6 of 
G is a pointwise product 5 = [[j_, 6% of descending endomorphisms 5, of Gx, 
k =1,...,m, and that any such pointwise product is a descending endomorphism 
precisely when, for all N, <1 Gy, k = 1,..., m,itholds that []7_, 5x |v, is a descend- 
ing endomorphism of [[j_; Nx. Let N denote []_, Nx. 

From Lemma 20.3, we know that for each k, N;, is 1, Co, or a? Thus, VN X 
C. for some non-negative integer /. In particular, when N; = G; for at least one 
ke{l,...,m},N= Cc) for some J > 0, since Nz is Co or Cs. Thus, a dxly, 
is descending if and only if it is either the trivial endomorphism or the identity 
endomorphism of N, or equivalently, dx ly, is trivial either for all k, or for no k. 
From Corollary 20.4, this is equivalent to the requirement that all 6, be of the same 
kind. 


20.4 Descending Endomorphisms of Hamiltonian Groups 


A Hamiltonian group is a non-Abelian group in which every subgroup is normal. 
The smallest Hamiltonian group is the quaternion group Qs of order 8. Reinhold 


420 V. Madhusudanan et al. 


Baer, in [1], showed that a group G is Hamiltonian if and only if it is a direct prod- 
uct of the quaternion group, an elementary Abelian 2-group, and a torsion Abelian 
group in which every element has odd order (for proof of this result, see [8, Theorem 
9.7.4]). Applying Theorems 20.2 and 20.3, we shall now determine all descending 
endomorphisms of Hamiltonian groups. Observe that since all subgroups of a Hamil- 
tonian group are normal, every descending endomorphism of such a group is a power 
endomorphism, and vice versa. 

The quaternion group is a non-Abelian group of order 8, defined by Qs = 
{1,-—1,i, —i, 7, —j, k, —k} where r= qe =R= ijk = —1 and 1 is the identity 
element. The set {i, j} is a generating set of Og. Therefore, we may define all endo- 
morphisms of Qg by specifying their actions oni and j. If t € Des Qs, then it must 
be a power endomorphism, as noted earlier, and hence is completely defined by the 
powers of i and j to which it maps them. 


Definition 20.4 For u, v € {0, 1, 2, 3}, define rt” to be the unique endomorphism 
of Qg such that t”) (i) = i” and t(j) = j”, provided it is well-defined. 


We can determine Des Qg by a simple and direct computation. We present the 
result below, omitting the details (also see Theorem 20.9 and Remark 20.3). 


Proposition 20.1 The quaternion group Qs has eight descending endomorphisms, 
given by 
Des Og = {t” | u,v € {0, 1,2, 3},u=v_ mod 2}. 


Note that every descending endomorphism of Qs is a power endomorphism that 
either maps all elements to odd powers or maps all elements to even powers. Also 
note that for each non-trivial normal subgroup N of Qg, [Qs, N] = (—1) and N = 
N/[Qs, N] = 1 (for N = (—1)), Co (for N = (i), (J), or (k)), or Cc (for N = Qs). 
Thus, for all t ¢ Des Qs, T|x (the descent of t to Qg/[Qs, N], restricted to N) 
is either the identity automorphism or the trivial endomorphism of N. In fact, we 
observe the following. 


Lemma 20.4 Lett € Des Qs. Then for any non-trivial N S Qs, T\y is the identity 
automorphism of N if t is an odd-power endomorphism of Qs, and is the trivial 
endomorphism of N if t is an even-power endomorphism of Qs. 


The following result shows that the condition in Theorem 20.3 can be simplified 
in the case of a direct product where one factor is Abelian. 


Lemma 20.5 LetG = G, x Go, where Gz is Abelian, and let 5; € Des Gi,i = 1,2. 
Then 6, x 52 € Des G if and only if, for all N, < Gi, duly, x d2 € Des N, X Go. 


Proof The implication in one direction is obvious. For the converse, suppose that for 
all N; IG, dily, x 62 € Des N, x G3. To prove that 6; x 62 € Des G; x Go, we 
need to show that for all N; < G,; and No <I Go, (6; x Oa IN, xN € Des N, x No 
(Theorem 20.2). But as G2 is Abelian, for each Nz < Go, No = No, and hence 
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(5, x do) Winn, is the restriction of 5, ly, x 59 to Ny x No = N, X No. Again, since 
NN, x G» is Abelian the restriction of bly, x 6) € Des N; x G> to any subgroup is 
a descending endomorphism of the subgroup, completing the proof. 


Lemma 20.5 can be applied to Hamiltonian groups, since they are direct products 
of the non-Abelian group Qg with an Abelian group, the latter itself being the direct 
product of two Abelian groups. 


Theorem 20.8 Let G be a Hamiltonian group with direct product decomposition 
G = G, X G2 x G3, where G; = Qs, G2 is elementary 2-Abelian, and G3 is torsion 
Abelian with all elements having odd order. Let idg, and 0g, denote the identity 
automorphism and trivial endomorphism, respectively, of G2. Then 


Des G = {1 x 0g, x 53 | u, v € {0, 2}, 53 € Des G3 } U 
{tr x idg, x63 | u,v € {1, 3}, 53 € Des G3} 


where Des G3 is the set of all power endomorphisms of G3. If G3 has finite exponent 
n, then | Des G| = 8n. 


Proof We know, from Proposition 20.1, that the descending endomorphisms of G; = 
Qg are of the form t” where 0 < u, v < 3 andu =v _ mod 2, and that these are 
power endomorphisms that map all elements of G, to odd or even powers according 
to whether wu and v are both odd or both even, respectively. 

Now consider the torsion Abelian group A = G2 x G3, all of whose descend- 
ing endomorphisms are power endomorphisms. By Lemma 20.5, the descending 
endomorphisms of G = G, x A are all the endomorphisms of the form 6, x 6, with 
6, € Des G, and @ € Des A, such that for all N < Gj, Silay, x 6 € Des N, x A. 


We know that for all non-trivial Nj; < G; = Qs, duly, is the identity automor- 
phism if w and v are odd and otherwise it is the trivial endomorphism. Hence 
dily, x @ € Des N, x A if and only if either 5; = “” with u and v odd and 6 
is a power endomorphism of A mapping all elements to odd powers, or 6; = t” 
with u and v even and @ is a power endomorphism of A mapping all elements to even 
powers (by Theorem 20.3). Since A = Gz x G3, every power endomorphism of A, 
i.e., every descending endomorphism of A, is of the form 62 x 43 where 62 is either 
idg, or 0g, (the only descending endomorphisms of an elementary Abelian 2-group) 
and 63 is a power endomorphism of G3. As every element of G3 has odd order, any 
power endomorphism 43 of G3 can be expressed as an odd-power endomorphism as 
well as an even-power endomorphism. Hence, the condition in Theorem 20.3 always 
holds for any 52 x 63, and 8 = 69 x 63 is a power endomorphism of A = G2 x G3 
that maps all elements to odd powers if 62 = idg, and even powers if 52 = Og,. 
Then, the result follows on the application of Theorem 20.3 to 6; x 0. If G3 has 
finite exponent n, then | Des G3| = n, and hence | Des G| = 8n. 
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20.5 Descending Endomorphisms of Dicyclic Groups 


In this section, we determine the descending endomorphisms of dicyclic groups. 
These groups are similar in structure to dihedral groups, and this similarity is reflected 
in their descending endomorphisms as well. The dicyclic group of order 4n is defined 
by the presentation 


Dic, = (x,y |x" =1,x"=y’, yxy! =x7!). 


The element x” = y" is the unique involution of Dic, the centre of Dic, is (x”), and 
Dic, /(x”) = D,, [7]. In the lemma given below, we make some observations about 
Dic, that can be easily verified from the given presentation. 


Lemma 20.6 Consider the dicyclic group Dich. 


(i) Every element of Dic, can be written uniquely in the form x" y” where 0 <u < 
2n—land0O<v<1. 
(ii) The normal closure of any element of Dic, of the form x" is the cyclic subgroup 
(x ). 
(iii) The normal closure of any element of Dicy of the form x"y is (x?, x" y). If n is 
odd, then this is equal to Dic, itself, and if n is even, then it is equal to (x*, y) 
or (x*, xy) depending on whether u is even or odd, respectively. 


As in the case of dihedral groups, the descending endomorphisms of dicyclic 
groups can also be classified into two types in terms of their action on the two 
generators. Let n“'*/) and n@) be maps on Dic, defined by the prescription 
(sis), 


n xr xi, yr xy 


ii): 


xx, ye xi, 
O0<i,j <2n-1. 


Lemma 20.7 The map \"*')) defines an endomorphism of Dicy if and only if i is 
odd, and the map n@') defines an endomorphism of Dic, if and only ifi = 0 mod n 
and2j =in mod 2n. 


Proof If 6 =n“) or n°), then 5 defines an endomorphism of Dic, if and only 
if the images of x and y under 6 satisfy the relations given in the presentation of 
Dic, ie., 5(x)2" = 1, 8(x)" = 8(y)2, and 3(y)4(x)8(y)—! = 6(x)—!. First, observe 
that for 6 =n“) as well as 5 = n@#), 5(x)?" = 1. 

Now, consider 5 = yn“), Then 5(y)? = 6(x)" reduces to x” = x'”, which holds 
if and only if i = 1 mod 2. Clearly, 5(y)5(x)6(y)~! = 6(x)7! holds for all values 
of i and j. 

Next, consider 6 = n°), Then 5(y)? = 6(x)” reduces to x7/ = x'", which is 
equivalent to 2j = in mod 2n, and 5(y)6(x)5(y)~! = 6(x)7! reduces to x! = x7, 
which holds if and only if i = 0 mod n. 
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Theorem 20.9 The descending endomorphisms of Dic, are described and enumer- 
ated by the following: 


(i) Ifn is odd, then 
Des Dic, = {n) |i € (1, 3,...,2n —1},0 < jf < 2n— 1} U (yn FO), 0} 


and | Des Dic, | = 2n? + 2. 
(ii) Ifn is even, then 


Des Dic, = {n“ |i € {1,3,...,2n — 1}, j € {0,2,...,2n —2}} U 
{n° | i, 7 € {0, n}} 


and | Des Dic, | = n? + 4. 


Proof First, observe that 7“) and n°) leave all subgroups of (x) invariant. To 
check if they are descending endomorphisms, therefore, we only need to verify that 
they map any element of the form x‘ y to an element in its normal closure. 

Suppose that n is odd. Then by Lemma 20.6, the normal closure of any element of 
the form x’ y is Dic, itself, and hence every well-defined endomorphism of the form 
nD) orn is a descending endomorphism of Dic,. From Lemma 20.7, we know 
that 7“) defines an endomorphism of Dic, if and only if i is odd, and n°) defines 
an endomorphism if and only ifi =0 mod nand2j =in mod 2n. Since n is odd, 
these two congruences hold simultaneously exactly when i = 0 and j ¢€ {0, n}. This 
proves (i). 

Now, suppose that n is even. Recall that the normal closure of x'y is (x?, xy). 
Consider 6 = n“), From Lemma 20.7, i is necessarily odd. We know that Dic, = 
Dic,,/(x”) is the dihedral group of order n generated by the rotation x and the reflec- 
tion y (the cosets of (x”) with respect to x and y, respectively). If 6 = 8) € 
Des Dic,, then it descends to 6 € Des Dic, which maps X to x! and y to xy. Then 
Theorem 20.6 implies that j is even. To see that this condition is sufficient for 6 
to be a descending endomorphism of Dic,, observe that 5(x‘y) = x'+Jy, and since 
i is odd and j is even, it + j =t mod 2, which implies that x'+/y € (x?, xy), 
the normal closure of x‘ y. Thus, 6 maps every element of Dic, to an element in its 
normal closure. 

Next, consider 6 = n@‘/). From Lemma 20.7, 5 defines an endomorphism of 
Dic, if and only if i € {0,n} and 27 =in mod 2n. Since n is even, so is i (from 
the first condition), and hence the second condition reduces to j = 0 mod n. Thus, 
j € {0, n}. To see that 6 = nis adescending endomorphism of Dic, for alli, j € 
{0, n}, we verify that 5(x'y) is in the normal closure of x’ y. Indeed, 5(x'y) = x/*/, 
and since both i and j are even, xttie (x2, xy), the normal closure of x’ y. This 
proves (ii). 


Remark 20.3 We note that when n = 2, Dic, becomes Dicz = Qg, the quaternion 
group. In the case corresponding to even n in the above proof, the quotient Dic, will 
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be isomorphic to the Klein 4-group. However, the Klein 4-group is defined by the 
presentation of the dihedral group D, for n = 2, and is sometimes considered to be 
D,. As such, Theorem 20.6 applies to this case and hence the above proof is still 
valid forn = 2. 
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Chapter 21 ®) 
On Circulant Partial Hadamard Matrices 
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21.1 Introduction 


Ryser’s conjecture [12] is one of the interesting open problems since more than 50 
years. It states that there is no circulant Hadamard matrix of the order n > 4. It 
plays a crucial role in the study of r-row regular circulant partial Hadamard matrix 
r — H(k x n) of combinatorial design theory. In this continuation, Craigen et al. 
[2] have given many results. With the help of particle swarm optimization, Lin et 
al. [9] linked circulant partial Hadamard matrices with general difference sets and 
generated new r — H(k x n) for r =0 and r = 2. It provides one more find to 
analyse circulant partial Hadamard matrices. These matrices have direct application 
in the construction of functional magnetic resonance imaging (fMRI) designs. The 
fMRI is used for monitoring brain activity in response to mental stimuli given to 
the test subject [7]. Manjhi and Rana [10] worked on examples of circulant partial 
Hadamard matrices using sign patterns and elaborated on the term inequivalent on 
such matrices. 

In this research article, we worked on circulant partial Hadamard matrices. All 
the results in this paper are proved analytically with the hope of advancement in the 
theoretical part of the circulant partial Hadamard matrices. 
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21.2 Preliminaries 


21.2.1 Hadamard Matrix 


Hadamard matrix is a +1 square matrices H of order n such that HH? = n1,. The 
first example of the Hadamard matrix was forwarded by Sylvester in 1867, and he 
H H 
H-H) 

In 1893, J. Hadamard showed that the maximum determinant of a square matrix 
H = (hj,) of order n with ||h;;|| < 1 is n2. This maximum determinant value was 
obtained under the following conditions: 


observed that H is a Hadamard matrix then so is 


(i) HH’ =nlI,, where n = 1,2 or some multiple of 4. 
(ii) The entries in H are +1. 


J. Hadamard also gave examples of Hadamard matrices of order 12 and 20. These 
results of Hadamard revealed a famous and challenging conjecture on Hadamard 
matrices that is named Hadamard matrix conjecture [6, 13]. 


Conjecture 21.1 (Hadamard matrix conjecture) [fr is a positive integer, then there 
exists a Hadamard matrix of order 4r. 


The Hadamard conjecture remains one of the most famous unsolved problems in 
mathematics, despite the fact that there are numerous methods for constructing 
Hadamard matrices. Currently, unknown orders of the Hadamard matrix below 1000 
are 668, 716, 892 and 956, the last new Hadamard matrix of order 448 was con- 
structed by Kharaghani et al. [8] in 2005. 


21.2.2. Partial Hadamard Matrices 


A partial Hadamard matrix H(r x n) is arow orthogonal matrix, satisfying HH’ = 
nI, where I, is the identity matrix of order r. Following are some examples of partial 


(ati 4 1 1 1 1 
Hadamard matrices: (1 1 1)( 1 1-1 3) and 1-1 1-1 
1 1-1-1 


Partial Hadamard matrices are preserved by the permutations of rows and columns. 
They are also preserved by the multiplication of rows and columns with —1. Two 
partial Hadamard matrices are called equivalent if and only if one is obtained from 
the other by a sequence of operations of permutation of rows or columns and multi- 
plication of rows or columns by —1. Each partial Hadamard matrix is equivalent to 
a matrix in which all the entries in its first row as well as first column are 1’s; such 
form is called the normal form of partial Hadamard matrix [2]. 

By eliminating rows of Hadamard matrices, we can obtain a number of par- 
tial Hadamard matrices but the reverse process is not an easy task. Completion 
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of some special form of partial Hadamard matrices into a Hadamard matrix is 
quite interesting. For example, we can easily extend a partial Hadamard matrix 
H((n — 1) x n) = (x;;) toa Hadamard matrix H(n x n) by inserting a new last row 
(Xn1sXn2+Xn3s++++Xnn) Such that xp. = 19 tee for all kK = 1, 2,3,...,n. [Since 
(117) xiexee = 1 for all k = 1,2,...,.]. 


In 1977, Hall Jr. [5] had shown that a partial Hadamard matrix H((n — 4) x n) 
can be extended to a Hadamard matrix of order n. In 1978, Verheiden [15] gave some 
results in this regard. He had shown that every H((n — 7) x n) partial Hadamard 
matrix can be extended to a Hadamard matrix of order n. In 2009, Bulutoglu and 
Kaziska [1] had shown that the extension of partial Hadamard matrix H((n — 8) x n) 
to a Hadamard matrix of order n is not possible in general. 


21.2.3. Row Regular Matrices 


Matrices with constant row sum are called row regular matrices. Following are 


123 
: 1-1 1 1 ab 
some examples of row regular matrices: | 3 1 2 ], 111-1 and bal 
231 


A matrix with constant row sum r is called a r-row regular matrix. 


21.2.4 Circulant Partial Hadamard Matrices 


If we add one more restriction to the partial Hadamard matrix, such as it must be 
a row regular circulant matrix, then such partial Hadamard matrices are known as 
Circulant Partial Hadamard matrices. A r-row regular circulant partial Hadamard 
matrix with k rows and n columns is denoted by r — H(k x n). Ifr — H(k x n) is 
a r-row regular circulant partial Hadamard matrix with k rows and n columns, then 
following are some elementary relations among r, k, and n: 


(i) kK > 1 => n iseven. 
(ii) r < 5. 
(iii) k > 2 = n is divisible by 4. 


Ryser [12] conjectured that there is no circulant partial Hadamard matrix of order 
greater than 4. It is one of the famous conjectures on circulant partial Hadamard 
matrices. In 2021, Gallardo [4] has shown that there is no circulant partial Hadamard 
matrix of order n(n > 4) if fa is not a square-free integer. This gives partial proof 
of Ryser’s conjecture. However, the Ryser conjecture is still an open challenge for 
Mathematicians. In 2022, Manjhi and Rana [11] provided some results associated 
with r — H(k x 2k) and gave a construction method of such matrices. 

In 2013, Craigen et al. [2] forwarded some new results on circulant partial 
Hadamard matrices. These results strengthen the relationship among r,n, and k 
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in a circulant partial Hadamard matrix r — H(k x n). They introduce a relation 
rVk <n and with the help of this inequality, they have shown the following results 
onr — H(k x n): 


@) r<k. 
Gi) $}<r<n>k=1. 
Gi) =r spSk=4. 


Craigen et al. [2] also gave a table relating the parameters r, k and n in a circulant 
partial Hadamard matrix r — H(k x n). In this table, they have obtained different 
values of k up tor = 32 andn = 64. Some interesting conjectures given by Craigen 
et al. [2] are given below. 


Conjecture 21.2 Let c be a positive rational number. Among the even values of r 
such that rc = O(mod4) 


(i) The largest value of k is such thatr — H(k x rc) exist which is a Strictly increas- 
ing function of r. 

(ii) For the largest value of r for whichr — H(k x rc) exist, we must have cC-1l< 
k<c?. 


Conjecture 21.3 A 2— H(k x 2k) exist if and only if k — 1 is an odd prime power. 


As a consequence, we take the opportunity to learn more about partial Hadamard 
matrices and circulant partial Hadamard matrices. In the next section (Theorem 10), 
we have given the proof of one part of conjecture 2. 


21.3. Main Work 


The table given by Craigen et al. [2] in 2013 relating the parameters r, k and n in 
ar — H(k x n) by computer search and some results reflected by the table have 
already been done. In order to give a better insight into the table given in [2], we now 
prove the following results on r — H(k x n). 


21.3.1 New Bounds of r for Circulant Partial Hadamard 
Matrix r — H(k x n) in Terms of n Which Implies 
k=4 


Craigen et al. (Lemma 5 of [2]) gave bounds of r in r — H(k x n) that implies 
k = 4. Here in terms of the following theorem, we have given new bounds of r in 
r — H(k x n) that implies k = 4. 


Theorem 21.1 Letr — H(k x n) be a circulant partial Hadamard matrix with k > 
1 and r divide n, then 
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@) F<r<5ok=4. 

(ii) Ifr=5-—c,c>O0andn>6c>k=4. 

Proof (i) k#A1l=>r < 5.Sincen isa multiple of r, * is an integer. Now 3 <r < 
pees SoS SS 4, 

(ii) We have r = 5 —c and c < §. Since 5 - § = 3, 5 -C=randc<§ @ 
<r 5a ka=4. 
Clearly (i) and (ii) are equivalent statements. 


21.3.2. Some Results on Circulant Partial Hadamard Matrix 
r — H(k x n) Obtained with the Help of Column Sum 
Restrictions and k = 3 


The main purpose of this section is to picturize the column sum pattern in a circulant 
partial Hadamard matrix r — H(k x n) for k = 3. The pattern is shown in terms of 
the following theorem: 


Theorem 21.2 Let r — H(k x n) be a circulant partial Hadamard matrix, and let 
C; be the i‘ column sum of H for alli = 1,2,...,n. Then the following results 
hold: 


Gi) IfC; > Oforalli =1,2,...,n, thenr > 0. 
Gi) IfC; € {1, -1}, foralli =1,2,...,n, thenk = 1. 
(ii) Ifk = 3, then at least one C; must be equal to +3. 
(iv) Ifk = 3, then at least one C; must be equal to +1. 
(v) If k = 3, then the difference between number of C;’s with +1 and —1 is a 
multiple of 3. 
(vi) [fk = 3, thenn > 8. 


Proof (i) For given circulant Partial Hadamard matrix r — H(k x n), we have 
Cy + Cz +C3+---+C, =rk. (21.1) 
If possible, let r = 0, then C; + C2 + C3 +---+C, = 0 (from (21.1) andk > 


1) which contradicts that C; > 0 for alli = 1,2,...,n.So,r > 0. 
qi) C; € {1, -—1} foralli=1,2,....01> TU C=nSnk=2Sk= 1. 


Note 21.1 From the above results, we can conclude that k > 1 implies that all col- 
umn sums cannot be +1. 


(iii) If k = 3, then possible columns are given by 


-1 1 1 -1 -1 -1 
| —L f, 1}; 1 },] -1 ], 1 },j —1 ] or] -1 
1 1 -1 
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It is to notice that in all the above cases C; = +3 or +1 > C; = +3 for at least 
onei € {1,2,...,n}, otherwise k will become 1. 

(iv) We prove this result by contradiction. For C; € {3, —3}, we must have et Cc? = 
9n => 3 = 9 which is a contradiction. 

(v) Let x, y, z and w be the number of C;’s with 1, —1,3 and —3, respectively. 
Then x — y+ 3(z — w) = 3r. This implies that (x — y) is a multiple of 3. 

(vi) Suppose if possiblen = 4, then Cj + Ci + C3 + C{ =123C;4+C.+C34+ 
C4 € {0, +2, +4, +6}. Again since C; + Cy + C3 + Cy = 3r, therefore, (C; + 
C2 + C3+ C4) is divisible by 3. This implies that (C; + Cz + C3 + Ca) € 
{0, +6}.Now,C; + C2+C3+Cy=65> rk =65 73 =65 r=25 r= 
5(since n = 4). 
Thus, k = 4 which is a contradiction, therefore (C) + Co + C3 + C4) 4 £6. 
Next, if C; + Co +C3+C,=0=> 7r =0, then one of the column sums is 3 
and others are —1, —1, —1 or one column sum is —3 and the others are 1,1, 1. 


1 
Column sum 3 implies that one of the columns is | 1 }] = V; (say). Since 
1 
1 —l 
matrix is a circulant matrix, therefore V;;; = | 1 ] or 1 ]; this implies 
1 1 


that C;,,; = 3 or 1. This arrived in contradiction. Hence,n 4 4 > n > 4. Since 
we have a 0 — H(3 x 8) matrix given by 


-1 1 1 1-1-1-1 1 
1-1 1 1 I-1-1-1), 
-1 1-1 1 1 1-1-1 


therefore, fork = 3,n > 8. 


21.3.3 Results on Circulant Partial Hadamard Matrices 
Which Reflect Types of Column Sums 


In this section, we have given a picture of different column sums in r — H(k x n) 
under some relations among r, and k in terms of the following theorems: 


Theorem 21.3 [fn =rk and k > 1, then all column sums of circulant partial 
Hadamard matrix r — H(k x n) cannot be equal. 


Proof Suppose if possible C; = C for alli = 1,2,...,ninr — H(k x n). Then we 
have nC = rk =n > C = 1 and D"_,C? = nC? = nk. This implies that k equals 
1, which is a contradiction. 


Theorem 21.4 Let n =rk in a circulant partial Hadamard matrix r — H(k x n). 
If x and y are the column sums in H each of multiplicity 5, then k < 2. 
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Proof We must have 5(x + y) =n =rk and 2 (x? + y’) = nk. This implies that 
x+y =2and 5((x + y)? — 2xy) = nk. It follows that 2 — k = xy. Since xy > 0, 
therefore, k < 2. 


Theorem 21.5 Letr — H(k x n) be acirculant partial Hadamard matrix with con- 
stant column sum C and rk =n, then C = Jk. 


Proof Constant column sum in H implies that nC* =nk. It follows that 


C= Vk. Oo 


21.3.4 Some Results on Augmentation of Circulant Partial 
Hadamard Matrices 


In this section, we have given techniques to increase the number of columns in a cir- 
culant partial Hadamard matrix r — H(k x n) with the help of proper augmentations. 
The results have been shown in terms of the following theorems. 


Theorem 21.6 Existence of an r-row regular circulant partial Hadamard matrix of 
order (k x n) implies the existence of a circulant partial Hadamard matrix rm — 
H(k x mn) for all positive integer m. 


Proof rm — H(k x mn) is given by m copies of r-row regular circulant partial 
Hadamard matrix of order A(k x n). 
That is, rm — H(k x mn) =(AA---A)=mcopies of A. 


Corollary 21.1 Existence of a circulant Hadamard matrix of order n implies exis- 
tence of a circulant partial Hadamard matrix m./n — H(n x mn). 


Theorem 21.7 Let circ¢exny(X1, X2,X3,-+-,Xn) and circ xn)(1, 2, Y3,+++> Ym) 
be circulant partial Hadamard matrices with row sum r, and rp, respectively. 
Then (Circ gx n) (X15 X25 X35 ++ +5 Xn) (CIC xn) (15 Yas Y35 +++ Ym)) bs a circulant par- 


tial Hadamard matrix ifx-i) = ym—i) foralli = 0, 1,2,...,(k — l)andm,n > k. 


Proof of this theorem is obvious, and a generalized version of this theorem can be 
stated as follows. 


Theorem 21.8 Let A;(k x nj) = circ(x}, a: a, sos xi) be an r;-row regular cir- 
culant partial Hadamard matrix with k <n; for all i =1,2,3,...,t. Then the 
augmented matrix (Ay Az A3-:: A,) is a circulant partial Hadamard matrix 


with row sum Y~._, ri if Xp, 4 = oh, for allu =0,1,2,...,(K — 1) and for all 
a Le DY, 


Proof Since x;_, = x for all u=0,1,2,...,(k—1) and for all v= 
1,2,...,(@¢— 1), and k <n; for all i = 1,2,3,...,t. Therefore, the augmented 
matrix (Ay Az A3-:: A,) is a circulant partial Hadamard matrix with k rows. 
Hence, (Ay Az A3-:: A,) is a circulant partial Hadamard matrix with row sum 


ae Vi. 
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21.3.5 Non-existence of Circulant Skew Hadamard Matrix 
of Ordern > 1 


Theorem 21.9 There does not exist any circulant skew Hadamard matrix of order 
n>. 


Proof Let H beacirculant skew Hadamard matrix of ordern > 1.LetC,, Co, ..., Cn 
be column sums of H, then )~"_, C; = n/n. This implies that the sum of all entries 
of H + H™ is 2n./n. Since His a skew Hadamard matrix, H + H’ = 21,. It follows 
that 2n./n = 2n. This implies that n = 1. We arrived at a contradiction. 


21.3.6 One Part of Conjecture Given by Craigen et al. [2] 
for the Existence of 2-H(k x2k) 


Theorem 21.10 A 2 — H(k x 2k) exist ifk — 1 is an odd prime power. 


Proof The result is implied by the following two theorems: 


(i) For every odd prime power p’, there exist a negacyclic W(p’ + 1, p’). 
(Theorem 1.6 of [3, 14]) 

(ii) There exist a2 — H(k x 2k) if and only if there exist a negacyclic W(k, k — 1). 
(Theorem 8 of [2]) 


21.4 Thoughts and Conclusion 


For finding complete circulant partial Hadamard matrices H(n) forn > 4, the state- 
ment of Theorem 21.1 strengthens Ryser’s conjecture for large n where k is always 
4 in the described range. The results that have been obtained in this paper are mainly 
derived analytically from the basic properties of circulant partial Hadamard matri- 
ces. In spite of that, this type of analytical approach towards the development of 
theory may find some new paths for the invention of new techniques. Part (7) of 
Theorem 21.2 gives new evidence that if C; > 0 for alli then r 4 O and therefore 
H(n) never exist for r = 0; existence of least order of n for k = 3 is given in Part 
(vi) of Theorem 21.2, and remaining results are obtained on some restrictions of 
column sums C;’s in circulant partial Hadamard matrices. Theorems 21.3, 21.4 and 
21.5 have a remarkable approach as they reflect the pictures of different column sums 
in r — H(k x n) under some relation between r, n and k. Theorems 21.6, 21.7 and 
21.8 are based on the augmentation of partial Hadamard matrices. Theorem 21.9 
gives non-existing evidence of circulant skew Hadamard matrices of order greater 
than 1. Theorem 21.10 gives proof of one part of the conjectures given by Craigen 
et al. [2] on the existence of 2 — H(k x 2k). The column sum approach may give a 
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new style of thinking for the solutions to many problems in this field, which has been 
used in this paper. All the theorems stated in this research paper are strengthening 
the theory part of circulant partial Hadamard matrices. Overall, the finding of this 
research may be useful for scholars. 
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Chapter 22 Mm) 
On Weak Hypervector Spaces cro 
Over a Hyperfield 


Pallavi Panjarike, Syam Prasad Kuncham, Madeleine Al-Tahan, 
Vadiraja Bhatta, and Harikrishnan Panackal 


Mathematics Subject Classification (2010) 20N20 - 97H60 


22.1 Introduction 


In an algebraic structure, the composition of two elements is again an element in 
the underlying set. Hyperoperation is a generalization of binary operation in a clas- 
sical algebraic system, wherein the composition of two elements is again a non- 
empty set. For a non-empty set S, let P*(S) be its power set excluding empty set. A 
hypergroupoid on S is a mapping o: S x S > #*(S). A semihypergroup (S, o) is a 
hypergroupoid which satisfies 


(xoy)ow=xo(yow) ie. U vow= U Xou 


vexoy ueéyow 
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for all x, y, w in. Using the notion of hyperoperations and H,,-operations, the clas- 
sical vector spaces are generalized in several ways. Scafati-Tallini [12] introduced the 
idea of hypervector spaces over fields, in which the external operation was consid- 
ered as a hyperoperation. Vougiouklis [15] introduced the notion of H,,-vector spaces 
and discussed the properties such as linear independence and spanning set. Ameri 
and Dehghan [2] studied further the concept of hypervector spaces over fields. They 
extended some of the fundamental results of vector spaces to hypervector spaces 
over fields, and explored the concept of dimension. Taghavi and Hosseinzadeh [13] 
studied weak hypervector spaces over a field using the idea of essential points. Roy 
and Samanta [11] explored hypervector spaces over a hyperfield by considering the 
action of a canonical hypergroup over a hyperfield with the external operation as 
a hyperoperation. Consequently, they have studied the quotient structure of these 
hypervector spaces. Furthermore, when the external operation is considered as a 
classical operation, Al-Tahan and Davvaz [1] investigated the properties of linear 
transformations and established the Rank Nullity theorem in hypervector spaces. 
The remaining part of the paper is organized as follows. In Sect.22.2, we provide 
necessary definitions and examples given by Al-Tahan and Davvaz [1], Viro [14], 
and Davvaz and Leoreanu-Fotea [3] that are used throughout the paper. In Sect. 22.3, 
we illustrate some remarkable examples of a weak hypervector space over a hyper- 
field. In Sect.22.4, we discuss the properties involving the notions such as direct 
sums and complements of subhyperspaces. In Sect.22.5, we start with the notion of 
linear transformation between two hypervector spaces and prove isomorphism the- 
orems. Unlike in classical vector spaces, an n-dimensional hypervector space need 
not be isomorphic to F” (as shown in Example 22), where F is a hyperfield. In 
Sect.22.6, we obtain the annihilator properties of linear functionals in hypervector 
spaces. Throughout this paper, F denotes a Krasner hyperfield as studied by Al-Tahan 
and Davvaz [1], and V denotes a hypervector space unless otherwise specified. For 
the basic concepts and related results in classical vector spaces, we refer to Hoffman 
and Kunze [6]. 


22.2 Preliminaries 


In this section, we present necessary concepts that are used throughout the paper. 


Definition 22.1 [3] A semihypergroup (H, 0) is ahypergroup if for all x € H; we 
have 
xoH=Hox=H. 


Definition 22.2 [3] A hypergroup (H, +) is called a quasicanonical hypergroup if 


(i) there is 0 € A such that u+0=0+u =u, forallu € H, 
(ii) for every u € H, there exists one and only one u’ € H such that 
Oe utu)N WwW +n), uw’ is denoted by —u. 
(ii) w €u+uvimplies v € —u+w and ue w-—v, forallu,v,w e H. 
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A canonical hypergroup is a hypergroup (H, +) which is quasicanonical and com- 
mutative. 


Definition 22.3 [3] A weak hyperring is an algebraic hyperstructure (R, +, -) which 
satisfies the following axioms: 
(i) (R, +) is a canonical hypergroup. 
(ii) (R, -) is a semigroup with zero as a bilaterally absorbing element, 1.e., 
u-0=0-u=0, 
(ii) u(v + w) Cuv+uwand(u+v)yw Cuw+ovw, 


forallu,v,w eR. 


A weak hyperring is called a Krasner hyperring if Condition (iii) of Definition 22.3 
holds with equality. A Krasner hyperring (R, +, -) is called commutative (with unit 
element) if (R, -) is a commutative semigroup (with unit element). 


Definition 22.4 [3] A Krasner hyperring is called a hyperfield, if (R \ {0}, -) is a 
group. 


Example 1 (Viro [14]) 


Let V be the set of all non-negative real numbers. We define the following hyperop- 
eration on V: 


nArn={reVi|r—ml<r<rjt+ry}, 
max{r}, ro}, ifr; A 12; 


va = [0,7] otherwise 


and 
rOn=nir2 


for all r;,72 € V. Then (V, A, ©) and (V, V, ©) are hyperfields (these are called 
triangle and ultra triangle hyperfields, respectively). 


Example 2 (Jun [8]) 


Let F, = {0, 1}. Define the hyperoperation + and the operation - on Fy by following 
tables: 


+0 1 ‘101 
0}0 1 0/0 0 
1|1 Fo 1/01 
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Then (Fp, ®, x) is a hyperfield. 


Example 3 (Viro [14]) 


Let C be the set of all complex numbers with the following hyperoperation: 


{x}, if |x| > ly]; 
{y}, if|y| > |x|; 
{ceC:ice Sy}, ifx+y AO and |x| = |yI; 
{cE C: |c| < |x|}, ifx+y=0 


xOoy= 


and 
XOy=Hxy. 


Here, S,y is the shortest arc connecting x to y on the circle with |x| as radius. Then 
(C, ©, ©) is a hyperfield (called tropical hyperfield, it is denoted by Jc). 


Definition 22.5 [1] Let F be a hyperfield. A canonical hypergroup (V, +) together 
with a map -: F x V > V is called a hypervector space over F if the following 
conditions hold: 


(i) @- (uy +U2) = @- Uy += Ud; 
(iil) @+ B)-u =a-u,+B-u; 
(ili) @- (B+ ui) = (@B) - 1; 

(iv) a+ (—u)) = (-@) -u, = —(@- 1); 
(v) l-uw =u, 


for alla, B € F,uj,u. € Vandle F. 


Example 4 (Al-Tahan and Davvaz [1]) 


Let F be a hyperfield. Then F is trivially a hypervector space over itself. 


Example 5 (Viro [14]) 


Let (V, || - |]) be a normed vector space over C. We define ~: V x V > V by 


{v}, if [lvl] > [lull 
{u}, if [zl] > [lull 
uvv= le . x . 
———(av+ bu): a,b € R™} U{v,u}, ifv+u 40 and |lv|| = |lu|l; 
lav + bul 


{we V: lull < lull}. ifvt+u=0. 


22 On Weak Hypervector Spaces Over a Hyperfield 439 


Then V = (V, ~) is acanonical hypergroup. Furthermore, V is a hypervector space 
over J ¢ with respect to the external operation: (a, v) + avforalla € Jcandv € V. 


Definition 22.6 [1] Let F be a hyperfield. A canonical hypergroup (V, +) together 
withamap-: F x V — Viscalleda weak hypervector space over F if the following 
conditions hold: 


(i) a: (uy tun) Ca-uy+a-uy; 
(ii) (@+ B)-u; Ca-u,+B-uy; 
(iii) a: (B-u1) = @B)- 01; 

(iv) @ + (—u,) = (—@) +4; = —(@- uy); 
(v) leu =u, 

for alla, B € F,uj,u2 € Vandle F. 


Definition 22.7 [1] Let V be a (weak) hypervector space over F’. Then a non-empty 
set U C V is called a subhyperspace of V if 


(i) uy —u2 CU; 

(ii) a-u, EU, 

for all uj, uu. € U anda F. 

Proposition 22.1 ([1]) Let V be a (weak) hypervector space over F.Then® # W © 


V is a subhyperspace of V if and only if a -u, + B+ uz C W forall uy, u. € W and 
a,BeF. 


Definition 22.8 [1] Let V be a (weak) hypervector space. Then uw, u2,--- ,up € V 


Pp 
are said to be linearly independent if 0 € > au; implies a; = 0 forall 1 <i < p. 
i=l 


Definition 22.9 [1] Let V be a (weak) hypervector space and S = {wu}, u2,--- , 


Up} C V. Then S is said to span V if for each v € V, there exist a), @2--- ,a@, in F 


p 
such that v € )° a@ju;. 


i=l 


V 
Definition 22.10 [1] Let W beasubhyperspace of V. Define — = {u+ W: ve V} 


with the hyperoperation v + W ®@u+W=(v+u)+ Wandax(u+ W) =au+ 
W, foralla € Fandu,ve V. 


V 
Remark 22.1 [1] For a subhyperspace W of V, (a ®, 7) is a hypervector space 
over F. 


Proposition 22.2 ([{1]) Let U,, U2 be subhyperspaces of V. Then U, N Uz and 
U, + Un = {u+u:u € Uj, v € U2} are subhyperspaces of V. 
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22.3 Some Examples of Weak Hypervector Spaces 


Throughout this paper, let V, W denote hypervector spaces unless otherwise spec- 
ified. In this section, we provide some examples, and we study a few properties of 
weak hypervector spaces that are different from classical vector spaces. 


Example 6 


Let M = | E a tk, ko, ka, kg € r| be the collection of all 2 x 2 matrices over 
3 K4 


F,. The hyperoperation addition +, on M is defined as follows. For any A = 
ky ko _ fk kg 
[i cle {e mee 


]ivemomns= 1.23.41, 
‘A 


For instance, 


fro} [ro] =f L19}- Lo) [oa] -[oe]} 
fro) [ro] =t to) -[oo] 


We claim that (M, +) is a canonical hypergroup. Clearly +4 is well defined. Let 
A, B,C € M. Now 


_ J ffi fay. < eG im Ci C12 
(A+m B)+uMC= | EB | 1 ti € aij BOij,1, jf = 1.2] +M EB | 


ty Ng 
= E i ty € ty B cij, tij € aij ® bij, i, j = 1.2] 
21 hp 


tack F a 
= E 2 14; € ij ® bij) @ cij = aij ® (bij @ cig), i, J = 1, 2} 
=At+m(B+mC). 


Clearly A + MC M. Let B = [b;;] ¢ M where b;; € Fy. As Fy is a hypergroup, 

we get bj; = aj; ® Fy. So, there exists t;; € Fy such that bj; € aj; @ t;;. Take T = 
ft / 

E i . Then BEAtmT CAtMM=M-+4y A. Therefore (M, +) is a 
21/22 

hypergroup. Further we can verify that (M, +4) is a canonical hypergroup. Define 

an external operation -: Fy) x M—> M by 
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= 411 412] _ | Aa\1 Adaj2 
a-A=a a : 

a21 422 ad) Aa22 
Then (M, +) forms a weak hypervector space over F». In fact, 


at) [Se] 9 eavennts=13] 


tect, ee 

= ae tj € on (aij ® dij) = aaj; Ba b;j, i, j = 1,2 
ty thy 

=a,;:A+ma,-B, 


for alla; € Fo, A, BEM. 


(a, Ba)-A= { i | fea, oa} 
a21 a22 


ti, t i, 
Cc i a hij € aa;; @a2a;;,i, j = 1,21 


for alla, @2 € Fy, A € M. Inasimilar way, we can verify the other axioms of weak 
hypervector spaces. So (M, + y, -) is a weak hypervector space over F). 


. 01 00; |O1 OO; |. 
Evidently, U -| E 4 ‘ ' 3 ‘ lo o| ; k 4 is a subhyperspace of M. 


More generally, let F be any hyperfield and let M,,, denote the set of all m x n 
matrices with entries from the hyperfield F. Define +, on Minxn 


mxn 


A+Muyn B= (hij) : tijy € aij B dij} 
for all A, B € Mmnxn. We define scalar multiplication -: F x Minxn > Minxn by 
a-A=a- (aj) = (aaj) for alla € F and A € M. Then (Mnxn, +m,,,,>°) 18 a 
weak hypervector space over F. 


m xn 


Example 7 


Let X #£@ and F* be the set of all functions from X to F. Then F* is a weak 
hypervector space over F, where @: F* x FX — P*(F*) andx: F x F* > FX 
are defined as (f ® g)(u) = f(u) + g(u) and (ax f)(@u) =a- fu), forall f,g € 
F*,aeF, u€X. We give the following illustration to show that Condition ii of 
Definition 22.5 fails to hold. Take E = {a, b}. Then F{ is a weak hypervector space 
over F>. Here Ee ={fo, fi. fo, fa} where fo(a) = fo(b) = 0, fia) = 9, fib) = 
1, fa(a) = 1, fo(b) = 0, and f3(a) = f3(b) = 1. Now we can see that (1 + 1) fg = 
(0, fs} Sl x fp@lx fy = FF. 
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Example 8 


From Example 1, V = (V, A) is a canonical hypergroup and K = (V,V,©) isa 
hyperfield. With the external operation (a, v) +> av, V is a weak hypervector space 
over K. Here Condition ii of Definition 22.5 fails to satisfy. For instance, 2,3 € 
V,5€V, 2V3)5 = 15, and 2-5A3-5 = [10, 15]. 


Proposition 22.3 Let V be a weak hypervector space over K, and0 #.a € F. Then 
a-0=0. 


Proof For any t€a-0—a-0, we havea-O0€t+a-0. ThenOea!-t+0= 
{a~'-t}. This implies 0 = a7! -t and soa-0 =f. That isa-0—a-0= {a - O}. 
Note that a-0 =a- (—0) = —a-0, andso0 €a-0—a-0= {a-0} which gives 
O=a-0. 


From the axioms of a classical vector space, we can conclude that 0- x = 0. This 
property is preserved in hypervector spaces as well. However, in a weak hypervector 
space, this property is no longer true as we show in the following example. 


Example 9 


Consider F, as discussed in Example 2. We define an external operation * : F) x 
IF), > Fy byaxx =x forall a, x € Fy. Then we can see that Fy is a weak hyper- 
vector space over F> (in fact, (0+ 1)*1=1e1+1=(0«1)+(1« 1), whereas 
0-1=10). 


This observation leads to the following generalization. 


Proposition 22.4 Let F be a hyperfield and (VY, +) be a non-trivial canonical 
hypergroup such that x = —x and x € x +x forall x € VY. Then (V, +) is a weak 
hypervector space with respect to the external operation. : F x V — V defined by 
a-x =x foralla € Fandx €V. 


Proof Leta, B € F andv,w € V. Thena(v+w) =v+w=a-v+a-vw. Also, 
we have (a+ 6)-v=veEev+vu=a-v+ 6. v. Ina similar manner, we can prove 
the rest of the conditions of a weak hypervector space. Hence, V is a weak hypervector 
space over F’.. But by definition, forO A x Ee V,0-x =x £0. 


Definition 22.11 A weak hypervector space V over K is said to be torsion-free if 
0-v=O0forallv eV. 


Remark 22.2 (i) Every hypervector space is torsion-free. 
(ii) The weak hypervector spaces discussed in Proposition 22.4 are not torsion-free. 
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Example 10 


Fc is a torsion-free weak hypervector space over the phase hyperfield @ defined in 
Example 12. 


Theorem 22.1 ([14]) Any subset A of Tc containing 0, invariant under the invo- 
lution x +» —x and closed with respect to multiplication, inherits the structure of 
weak hyperring from Tc. Furthermore, if A \ {0} is invariant under the involution 
x + +, then A with the inherited structure is a hyperfield. 


x 


Example 11 (Viro [14]) 


R inherits the structure of hyperfield from Tc with 


{ri}, if |r)| > |rol; 
{ro}, if |ro| > Ini: 
OR =r19rm2NR= : 
2S a ee {ri}, ifr) try # Oand |[ry| = |rol; 


[-Ini.linil, ifri +r =0. 


Example 12 (Viro [14]) 


The set @ = {z € C: ||z|| = 1} U {0} also inherits the hyperfield structure from Tc. 
The hyperoperation addition is as follows: 


{ceC:ceS,,.}, ifzi+ 2 #0; 


Z1 0p 22 =Z192NG= 
? 7) otherwise. 


The hyperfield (¢, og) discussed in Example 12 is called the phase hyperfield. 


Example 13 


Consider the tropical hyperfield 7c in Example 3. The set J = {1, 7, —1, —i} U {0} 
also inherits the hyperfield structure from 7c. The hyperoperation addition on J is 
as follows: 

{x1, x2}, ifx; + x2 #0; 


XpOr%X%. =X, OXNIT = : 
I otherwise. 
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Example 14 (Viro [14]) 


Take S = {—1, 0, 1}. ThenS inherits the hyperfield structure from 7¢, witha os b = 
aobQS. This hyperfield is also called hyperfield of signs. 


Theorem 22.2 Let (A, o,) be a hyperfield inherited from (Tc, ©). Tc is a weak 
hypervector space over A with respect to the external operation-: Ax Tc > Tc 
defined bya-v =av foralla € Aandv € Tc. 


Proof We have a- (vow) =a(vow) =aveaw for all a€ A and v,w ETc. 
Also, (a 04 b)-v = ((aob)N A)uv C (aob)v = avo by, foralla, b € Aandv € 
V. Rest of the axioms follow directly from the properties of Tc. 


Example 15 


From Example 13, we have / is an inherited hyperfield. By Theorem 22.2, it is 
interesting to note that 7c is a hypervector space over the finite hyperfield J. 


Definition 22.12 Let V bea weak hypervector space and B = {vj, v2,--+ , Un} CV. 
B is called a weak basis if it is a linear independent spanning subset. A weak basis B 
is called a basis if every element of V is in a unique linear combination of elements 
of B. 


Remark 22.3 If V is a hypervector space, then the notion of weak basis and basis 
coincide. 


Remark 22.4 In a classical finite dimensional vector space, every element can be 
expressed as a unique linear combination of elements in its basis. This property 
plays a key role in proving many structural results of a vector space. Analogously, 
in a hypervector space, every element belongs to a unique linear combination of 
elements in its basis. However, this property fails to hold in a weak hypervector 
space. Indeed, in a weak hypervector space every linearly independent and spanning 
set need not inherit this property. 


Consider the following examples. 


Example 16 


Let F> be the hyperfield of Example 2 and ey = {0, uw), U2, u3} where 0 = (0, 0), 
u, = (0, 1), wz = (1, 0), v3 = (1, 1). Define @ on ae as follows. 
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0 uy uz u3 

{uo} {us} 
{ui} {0,1} {u3} {u2, us} 
{uz} {us} {O,u2} {u1, us} 

u3|{u3} {u2,u3} {u1,u3} Fe 


5S = lol 
= 
Io 
IS 
7 
= 
= 


Then (F2, @®) is a canonical hypergroup. 

We define: : Fy x ey => F3 bya - (x, y) = (a-x,a-y).Then (F2, @) isa weak 
hypervector space over F>. Here {u;, v2} is a weak basis for ae We can observe that 
every element can be expressed as a unique linear combination of uv; and u2. But 
this uniqueness need not be achieved with respect to every basis of F}. For example, 
we consider the basis {u;, u3} where u3 belongs to two distinct linear combinations, 
that is,u3 € 1- uy @1- uz anduz €0-u; P@1- uz. 


Example 17 


In Example 13, one can see that v= 1+ i and w = 2 are linearly independent 
elements and 27 € span{v, w}. But we can observe that 27 is in two distinct linear 
combinations of v and w. That is, 2i € vo (—w) and 2i €ivow. 


If a weak hypervector space V has a finite weak basis B, then we say that V is 
finite dimensional and we write dim =| B|, otherwise V is infinite dimensional. 
Example 18 


Fc is a weak hypervector space over (R, og) of dimension 2 with {1, 7} as its weak 
basis. 


Example 19 


Tc 1s weak hypervector space over the phase hyperfield (¢, og) as discussed in 
Example 10. It is an infinite dimensional hypervector space with Rt as its weak 
basis. 


Example 20 


J c is an infinite dimensional weak hypervector space over (S, os). Here, B = {x, yi: 
x, y € R*} isa basis for Tc. 
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22.4 Properties of Subhyperspaces 


let V, W denote hypervector spaces unless otherwise specified. In this section, we 
define the direct sum of subhyperspaces, its complement, and finally we prove 
dim (U;) + dim (U2) = dim (U; N U2) + dim (U; + U2). 


Definition 22.13 (i) A subhyperspace W of V is said to be a direct sum of subhy- 
perspaces S and S’ if to each w € W, there exist unique u € S and v € S’ such 
that w € uw + v; we denote this by W = S@ 8S". 

(ii) If V = S@ S’ then we say S’ is acomplement of S. 


Theorem 22.3 V is a direct sum of the subhyperspaces U, and U2 if and only if 
V =U, 4+ U2 and U, N U2 = {0}. 


Proof Suppose that V = U; + U2 and U; N U2 = {0}. 
Let v € V such that v € w; + wz andv € w{ + w}, where w;, wi € Uj,i = 1,2. 
Then0 € v—v C w, — w} + we — w. The latter implies that there exist t) € w; — 
w; and tf) € w2 — ws such that 0 € t) + fh. Now, t) = —h € w; — w} C UV, implies 
ty € U,. Therefore, t2 € U; N U2 = {0}, andsot = O and consequently t; = 0. This 
shows that 0 € w; — w} and 0 € w2 — ws and hence, w; = w} and w2 = w5. 
Conversely, let x € U; N U2. Then, x = x +0 where x € U; and 0 € U2, and x = 
0+ x where 0 € U; and x € U2. By uniqueness of representation, we get x = 0. 
Hence U, N U2 = {0}. 


Lemma 22.1 ([{1]) Every linearly independent subset of V set is contained in its 
basis. 


Proposition 22.5 ([1]) Let B = {v, v2,--+ , up} be a basis of VY. Then eachv € V 
belongs to a unique linear combination of v1, v2,+++ , Vp. 


The following theorem shows the existence of complementary subhyperspace in 
hypervector spaces. 


Theorem 22.4 Every subhyperspace of a hypervector space VY has a complement. 


Proof Let Sbeasubhyperspace of V and B = {u1, uz, ..., uz} bea basis of S$. Then by 
Lemma 22.1, B is contained in a basis B’ of V, say B’ = {uy, U2, ..., UK, W1, W2, -, 
Wm}. Take S’ = span{w), W2, ..., Wm}. We prove that V = S@ S". Let v € V. By 


k m 

Proposition 22.5, there exists a unique linear combination )7 aju; + >> bj;w; such 
i=l j=l 
k m k m 

that v € 7 aju; + >> bjw;. Then there exist w € 7 aju; and w’ € > bj w; such 
i=l j=l i=l j=l 

thatv € w+w’ C §+ S’. The other inclusion is obvious. Let u € SNS’. Then u € 


k m 
Yo a,u; andue )* b,wj; for some a;,b; € F,1 <i <k,1< j <m. This implies 
i=l j=l 


i=l 


k m 
Oc d ajui _ (é bm) . As B' is a basis, we get a, = 0 for all 1 <i <k and 
j=l 


b, = 0 forall 1 < j <m. Hence, u = 0. 
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Theorem 22.5 Let U; and U2 be two subhyperspaces of VY. Then 

dim (U;) + dim (U2) = dim (U; N U2) + dim (U; + U2). 
Proof Let dim (U,) = m, dim (U2) = n and dim (U, N U2) = r. 


Let B = {uj, U2, ..., u,} be a basis of U; MN U2. Then by Lemma 22.1, B is con- 
tained in a basis B’ of U; and a basis B' of U> where 


7 
Bi = (uy, Ug, ..., Up, W1, W2, «+, Wm—r} 
and 
B’ rw) / / / 
= {U1 U2, ..., Ur, W, Wa, 0, Wy_p} 
Consider L = {uy, uo, ..., Hes Wis Rs Wmn— ry Wy, W2, 1, W)_,}. We prove that L 


is a basis for U; + Uz. Suppose 0 € are uj + x bjw;+ » c,wi,. Then there exist 
i=1 
te om c.w, and t' € Sak + > b;wj; such that 0 € ¢ — ¢’ implies that —t = 1’, 
k= i=l j= 


and shows that t € U;, andt € U>. Therefore, t € U; M U2. This gives —t € x dju; 
i=l 


for some d; where | <i <r. Now, CerFEY dupt > aul. Since B’ is 


i=1 k=1 
linearly independent, we get d; = 0 for all 1 <i <r and c, =O where 1<k< 
n—r.So,0€ a b;w;. As B’ is linearly independent, we get b; = Oforall1l <j < 
j=l 


m — r. Hence, L is linearly independent. It remains to show that span(L) = U; + Up. 
Let v € U; + U2. Then v € w; + w2 for some w, € U; and w2 € U>. Since B’ isa 
basis for U;, there exist unique a,(1 <i<r)and bi < j <m-—r) such that 


m—-r 


wie Daim a Ww}. 


Since B’ is a basis for U2, there exists a unique d, (1 <i <r)and Cy (1<k<m-r) 
such that 


m—r 


W2 € Ya + ave 


Therefore, 


m—-r m—-r 


vemtins Dam Dy wrt Dod + Ee 
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r m—-r 


m—r 
= x (a; +d;) uj + a byw; + a CW. 
j=! k=l 


i=1 


This means that v € span(L). 
Hence, L is a basis for U; + U2, and dim (U; + Ux) =m+n-—r. 


Corollary 22.1 If V is a hypervector space, U; and U2 are subhyperspaces of V 
with V = U, @ Up, then dim(V) = dim(U;) + dim(U2). 


22.5 Properties of Linear Transformations 


In this section, we study linear transformations on (weak) hypervector spaces, and 
prove that the set of all weak linear transformations form a weak hypervector space. 


Definition 22.14 [1] Let V and U be two hypervector spaces over a hyperfield F. 
A map pw : V > Uis called a linear transformation if 


G) wu +v) = nv) +h); 

(i) (av) = ap(v), 

for all v, v’ € Vanda e F. 

A bijective linear transformation is an isomorphism. 


Definition 22.15 Let V and U be two weak hypervector spaces over a hyperfield F. 
A map y : V > Uis called a weak linear transformation if 


(i) pvt’) C wv) + wv); 
(ii) w(av) =ap(v), 
for allv, v’ € Vanda e F. 


Remark 22.5 Every linear transformation between two hypervector spaces is a 
weak linear transformation. 


Example 21 


As discussed in Example 16, F} is a weak hypervector space over Fe, Define 
by: Fp > Fy by 410) = i (uy) = 0, fi (uz) = i (v3) = 1. Then jy; is a lin- 
ear transformation. The map [2 : Fe —> Fy by 2(0) = 0, w2(u1) = 1, U2(u2) = 1, 
[42(u3) = | is a weak linear transformation, since (2(u, @ u3) = [2({u2, u3}) = | 
and j42(u,) + 2(u3) = 1+ 1 = {0, 1}, andhence 2(u;, uz) © W2(uy) + M2(u3). 


Remark 22.6 In classical sense, an n-dimensional vector space over a field K is 
isomorphic to K". This is not true in general for weak hypervector spaces. In 
Example 1, V is a weak hypervector space over K with dim V = 1. But V is not 
isomorphic to K because K is a hypervector space over K. 
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Example 22 


From Example 7, we have Ff = { fo, fi, f2, fs}. Now we defineamapn : Ff > F} 
by n( fo) = 9, n(fi) = u1, n(f3) = u2, and n(f3) = u3. Then 7 is an isomorphism. 


Proposition 22.6 ([{1]) Let V and U be two hypervector spaces over a hyperfield F 
and  : VY — U be a (weak) linear transformation. Then (Oy) = Oy and u(—x) = 
—w(x) forallx €V. 


Definition 22.16 [1] Let V and U be two hypervector spaces over a hyperfield F 
and uw: V > U bea linear transformation. Then we define the kernel of jz as 


ker uw = {(v € V: pv) = O}, 


and the image of jz as 
Im(w) = {u(v): v € Vi. 


Proposition 22.7 ([{1]) Let uw: V— U bea linear transformation. Then ker \u is a 
subhyperspace of VY and Im(\1) is a subhyperspace of U. 


Proposition 22.8 ([{1]) Let T : U — V be a linear transformation. Then T is one- 
one if and only if ker T = {0}. 


This property fails if T is a weak linear transformation (see Example 23). 


Example 23 


Consider the map {42 as defined in Example 21. Here, ker ~ = {0} but yz is not 
one-one. 


The following Proposition shows that a weak linear transformation between weak 
hypervector spaces carries a spanning set to a spanning set. 


Proposition 22.9 Let U and V be two weak hypervector spaces, S = {uy,U2,-°-- , 
Up} be a spanning set of U, and T : U — VY be an onto weak linear transformation. 
Then T(S) is a spanning set of V. 


Proof Let v € V. Since T is onto, there exists u € U such that T(uw) = v. We can 


Pp 
find a; € F,1 <i < psuchthatu € )° ajuj. 


i=1 


P P 
Then v = T(u) € T(>> aju;) © ¥ a;T (uj), as desired. 
i=l i=l 


Corollary 22.2 Let T : U = V be an isomorphism and B a basis of U. Then T (B) 
is a basis of V. 
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Theorem 22.6 ([{1]) Let U and V be finite dimensional hypervector spaces, and 
T :U— V be a linear transformation. Then dim(ker T) + dim(/mT) = dim U. 


We summarize Proposition 22.8, Proposition 22.9, and Theorem 22.6 as follows. 


Theorem 22.7 Let U and V be two hypervector spaces with dim U = dim V = n 
and T :U > VY be a linear transformation. Then the following are equivalent: 


(i) T is bijective. 
(ii) T is one-one. 
(iii) T is onto. 
Theorem 22.8 Let « : VY — W be an onto linear transformation of hypervector 
spaces V and W. Then 


is isomorphic to W. 


V Vv 
Proof Define n : a —> W by n (v+ ker w) = w(v) for any v + ker uw € ares 
er er 


To see 7 is well-defined, let v + ker uw = v’ + ker yw for any v, v’ € V. 

Then ve€v+0evu+kerpy =v'+kerp implies that v € v'+r for some t € 
ker w. This means, (v) € w(v' + t) = w(v’) + w(t). That is, w(v) € w(v’) +0 = 
Lv’), and so n(v) = n(v’). Suppose that ny (v; + ker w) = n (v2 + ker). Now 
[L(v1) = (v2) implies that 0 € w(v1) — W(v2) = “(v1 — v2). This means there 
exists x € vy — v2 such that w(x) = 0. From this, vj; € x + v2 and v2 € vj — Xx. 
So, vy € vo + ker and v2 € v; + ker w. Therefore, vj + ker uw C v2 + ker w and 
v2 + ker uw C v; + ker pw. The latter implies that vj + ker w = v2 + ker w and hence, 
n is one-one. Let w € W. Since pu is onto, there exists v € V such that w(v) = w. 
Now, n(v + ker uw) = (v) = w. Therefore, 7 is onto. 


anda, 6 € F. Then 


Let v; + ker w, v2 + ker € 
ker yu 
n (av; + ker w+ Buo + ker w) = n (avy + Bv2 + ker 1) 
= (av; + Br2) 
= ap (v1) + Bu(v2) 
= an (v; + ker w) + By (v2 + ker p). 


Therefore, 7 is a linear transformation. Hence k is isomorphic to W. 
er [L 
We Ur. 
Proposition 22.10 Jf U, and U2 are subhyperspaces of V, then Un, is isomor- 
1 2 
U; + U2 


1 


Dhic to 


U,+U 
Proof Define a map p : U2 > = by w(v) =v+ VU, for every v € U2. We 


1 
prove that yu is a bijective linear transformation. Let u, v € U2 anda, b € F. Then 
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(au + bv) = (au + bv) + U, = (au + Uj) + (bv + Uj) 
=atutUi)+b+U)) = anu) + bur). 


Therefore, jz is linear. 
U,+U>, 
Let v+U,€ 7 Then, v € U; + Up. So, v € w; + w2 for some wy € 
U, and w2 € Up. Nowe + U; € (wy + w2) + U; = (w; + U1) + (w2 + *U;+/. This 
means that v + U; € {w2 + Uj} implies that v + U; = w2 + U; = ww (w2), and so 
LL (w2) = v + U;. Therefore, jz is onto. 
Suppose that w(v) = 0+ U; where v € U2. Then, ve v+0€04+U, =04+0, 
implies v € Uj so, v € Uy; N Up. On the other hand, v’ € U; N U2 implies p(v’) = 
v’ + U; = U, and hence, ker 4 = U; N U2. The rest follows from Theorem 22.8. 


Corollary 22.3 If U; and U2 are complementary subhyperspaces of a hypervector 
V 


space V then U2 is isomorphic to UW 
1 


Proof The proof follows from Proposition 22.10. 


Theorem 22.9 Let (V,+,-) be a (weak) hypervector space over F and let U be a 
non-empty-set such that there is a bijection f : V + U. Then U is a (weak) hypervec- 
tor space over F under the hyperoperation ® and the external operation o induced 
by f, defined as follows: 


u@v=f(f-'w)+f-'()) andaous= f (a: f-'(u)) 


forallu,v € Uandae F. 
Furthermore, f is an isomorphism between V and U. 


Proof First, we show that f gives rise to a canonical hypergroup structure for the 
set U. Let wu), uo, u3 € U. Then 


my ® (2 @ us) = f[ J) + "(FF e) + 1 s))) | 
= f[ fu) + fa) + Fs) ] 
= Flt (f(F en) + fe) + FM] 
= (U, Buz) Bus. 


Let uw), v2 € U. Then f—!(u2) € f~'(u1) + V, and so f~!(u2) € fo!(uy) @ v for 
some v € V. This gives u2 € f (f-!a) + v) =u, ® f(v) Cu, OU. Similarly, 
we can show that U C U @ u,. Hence, (U, ®) is a hypergroup. It is easy to verify that 
(U, @) is a canonical hypergroup. Now it is clear that f is an isomorphism between 
the hypergroups (V, +) and (U, @). Further U becomes a (weak) hypervector space, 
following from the fact f is a homomorphism and that V is a (weak) hypervector 
space. In this case, f is a (weak) linear transformation and hence an isomorphism 
between the (weak) hypervector spaces V and U. 
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Now onwards in this section, a weak hypervector space V stands for a torsion-free 
weak hypervector space which has a basis. 


Proposition 22.11 Let V and W be two weak hypervector spaces over F and 
L(V, W) denote the collection of all weak linear transformations from V to W. 
For any [41, 2 € L(V, W), we define the hyperoperation ® by 


M1 ® fo = {uw € L(V, W): wv) € Hiv) + Ma(v) forall v € V} 
where (41 ® (2) (v) = 1 (v) + Mav), 


and the scalar multiplication. : F x L(V, W) > L(V, W) by 
(a-)(v) =ap(v) forall we L(V, W),a € F, andveV. 


Then L(V, W) is a weak hypervector space over F with respect to the hyperoperation 
addition and the scalar multiplication defined above. 


Proof For (1, U2, u3 € L(V, W), 


((141 ® 42) © 13) (0) = (11 © 2) 0) + Ha) 
= (wil) + wa) + 20) 
= 11(0) + (u2(v) + 2300) 
= wi(v) + (us ® 3)(v) 
= (a ® (H2 ® 113) (0) 


for all v € V. 
This implies (44; ® 2) 8 U3 = LL @ (2 @ 3). So, (L(V, W), @) is a semihy- 
pergroup. Let 441, U2 € L(V, W). Then 


(41 ® f2)(v) = py (v) + f2(v) = f2(v) + i (v) = (2 @ 1) (v) 


for allv € V. 

Let uw; € L(V, W) and take 0(v) = 0 for all v € V. Then 0 € L(V, W). Now 
for any v € V, (u, 8 O)(v) = 0G 1)(v) = Wy (v) + OV) = 1 (v). Clearly, 0 € 
[ty — fy as OE ey (v) — 4, (v) for all v € V. Now, suppose h € 4; @ 2. Then 
h(v) € “4 (v) ® f2(v) forall v € V. Then p41 (v) € h(v) — 2(v) and p2(v) € 4 (v) 
+ h(v) for all v € V. Therefore (L(V, W), @) is a canonical hypergroup. 

Leta, 6 € F and 4, U2 € L(V, W). Then 
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a - (1 B M2) = {u € L(V, W): wv) € (ui (v) + Ha(v)) forall v € V} 
Cc {ue LIV, W): wv) € apy (v) + ap2(v) forall v € V} 
= A+ Ly PA: Lo. 


On the other hand, 


(a+ B)- m1 = {ue L(V, W): wv) € (@ + B)ui(v) forall v € V} 
c {ue L(V, W): w(v) € api (v) + Bui(v), forall v € V} 
=a-"i OB: MW. 


In a similar way, we can verify the remaining axioms for hypervector spaces. Thus 
(L(V, W), @) is a weak hypervector space over F. 


Example 24 


Consider the hypervector space defined in Example 21. Define ig : Fe —> F, 
by 0) = Mo(u1) = Mo(u2) = Ho(us) =O and ps3: Fy > Fy by ysQ) = = 0, 
f3(uz) = 0, 3(u1) = w3(u3) = 1. Then L(FZ, Fo) = fis [1, 42, 43} is a weak 
hypervector space. 


Definition 22.17 In a finite dimensional (weak) hypervector space V, an ordered 
basis is a finite sequence of vectors which forms a basis for V. 


Theorem 22.10 Let {v,, v2, +--+ , Un} be an ordered basis for V.If{w , w2, +++ , Wn} 
is an ordered set of elements in W with at most one non-zero w;, then there exists 
a unique linear transformation gy from V to W such that g(v;) = w; for every 1 < 
i<n. 


n 
Proof Let v € V. Then there exist a), do, ..., a, such that v € > a;u;. 
i=l 
Let {w,, w2,--- , Wy} be elements in W with w, = 0, w2 = 0,--- , wj_-1 = 0, 
Wi41 = 0,--- , Ww, =O and w; 4 0. Now define g : V > W by g(v) = a;uj. It is 


obvious that ¢ is well-defined. Now, g(v;) = i is id 2 To prove ¢ is linear, let 


n n 
u,v €V. Then u € >) aj;v; and v € >> bju; for some a;, b; (1 <i, j <n) implies 


i i=l 


u+tve > ajyvj + 3 b;v;. This gives p(u + v) € g(u) + v(v). Similarly, (du) = 


dp(u) for all é€ F. 
Let gy’ : V — W bea weak linear transformation such that g’(v;) = w;. Let v € V 


n 
with v € )° aj;v;. Then 
i=l 
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n 


g(vyEeg’ bs dj ») Ca; (x 00) (since ¢’ is weak linear) 
1 i=l 


i= 


= ajw; = g(v). 


Therefore, yg’ = g. 


Definition 22.18 Let V be a weak hypervector space over a hyperfield F’. A (weak) 
linear functional is a (weak) linear transformation from V to F. 


Corollary 22.4 Let {v;, v2,--- , U,} be an ordered basis for V. If {a,, a2,-++ , An} 
is an ordered set of scalars with at most one a; # 0 then there exists a unique linear 
functional g on VY such that p(v;) = a; for every 1 <i <n. 


Proof It follows from Theorem 22.10. 


For a hypervector space V, we denote L(V, F) by V*. 


Theorem 22.11 Let {v,, v2,--+ , U,} be a basis ofa hypervector space V. Then there 
exists a unique basis B* = {~1, ¢2,+++ , Qn} of V* such that dim (V*) = dim(V). 
Furthermore, we have 


fe SS S(u;)9; where f is a linear functional 
i=l 


and for each x € V, we have 


n 
xeE D> g(x) yy. 
i=1 


i - # ; . Then by Corollary 22.4, g; is a weak 


linear functional. We prove that B = {@, @2, ..., @n} is linearly independent. Sup- 


Proof Define g;(vj;) = di; = 


pose that f =0 with f € )\a;g; for some a; (i <i <n). Then O= f (v;) € 
i=l 

Dd aig (v;) for each j. This gives, 0 = f (vj) € x a;6;; = a; for each j, implies 

a; =O for (1 < j <n). Therefore, {¢1, @2, ..., @,} is linearly independent. 

Now we show that {¢, 2, ..., @,} spans V*. Let f € V*. Put k; = f(v;). Then for 


any x € V,x € >° a;v;. Then, 


i=l 
n n n 

Q(x) € gy (sa Cc S-aig; (vj) = S>aidij = dj. 
i=1 i=1 i=1 


So, gj (x) = a;. Therefore, x € > gj (x)u;. 


i=l 
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We have 


fmef (> a; “| = Soa f (v;) (since f is a linear functional ) 


i=1 i=1 


= D7 gia) fw). 


i=l 
So, 
f()e > gi(x) f (vj) = S> kigilx) for every x. 


i= i=1 


Therefore, f € = kip; = > f (v;)g;. This shows that {91, @2, ..., @n} spans V*. 


Furthermore, the ame are angus and hence B is a basis. Thus, 
dim (V*) = dim(\V). 


Example 25 


From Example 24, (fea = {o, [1, M2, 43}. Now we defineamapT : (Fey > Fe 
by P(uo) = 0, P(e) = u1, T(u2) = uo, and P'(43) = u3. Then I is an isomor- 
phism. 


Theorem 22.12 Let V and W be two hypervector spaces with dim (VY) = m and 
dim (W) = n. Then dim (L(V, W)) = mn. 


Proof Let {v,, v2,--+ , Um} and {w1, w2,--- , Wa} be the basis of V and W, respec- 
tively. For 1 <i <n,1< j,k <™m, define 
_ 40, iff Ak 
Tylon) = | appt = djWi, 


where 6 ;, is as defined in the proof of Theorem 22.11. Then by Theorem 22.10, 7;; is 


a weak linear transformation. We prove B = {7;;;1 <i <n,1< j < m}isa basis 
n m 

for L(V, W). Suppose f =0 € )° ¥° ajjTi;, where aj; € F. Now for 1 <k <m, 
i=1 j=1 


n m 


fre) =0€ YOY aij Tij (vx) 


i=l j=l 


n m 
=) ) Ajj 6 jk Wi 


i=l j=l 
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n 
= > aj, UWj- 
i=l 


Thatis,O € }° ajgw;. Since {w1, wo, +++ , Wa} is linearly independent, we have ajx = 
i=l 
0(1 <i <n) for every k. Therefore, B is linearly independent. 
Next, let T € L(V, W). Now for 1 < k < m, T(u,) € W, andso, T (vz) € Yo ajxw;. 
i=l 
For any x € V, we have x € > djvj, dj; ¢ F, A < j < m). Then, 
j=l 


m m 


T(x) €T | Yo djvj | S > djT(vj) 
j=l j=l 


m n 
Sc ) dj ) jj Wi 
j=l i=l 


m n 


= Yo > aijdjuy. 


j=l i=l 
Also, forl <i<n,l<j<m, 


m 


Tij(x) € Tij (34) c So dk Tij (ve) = So ded jn wi = djwi- 
k=l 


j=l j=l 


m n m n 
Therefore, T(x) € )° )0 aj;jTjj(x),andsoT € > >> a;;T7;;. Further, we can observe 
j=li=l j=li=l 
that a;;’s are unique and so B is a basis. Therefore, dim(V, W) = mn. 


Example 26 


Consider Example 24. The basis for fe is {u,, U2} and basis for LF. Fy) is {t11, 3}. 
Then dim L(F3, Fy) = 2 = dim(F3) dim(F)). 


22.6 Annihilators in Hypervector Spaces 


In this section, we discuss some properties of annihilators of linear functionals. We 
introduce a notion projection on a weak hypervector space and prove related results. 
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Definition 22.19 Let V be a hypervector space and 6 4 U CV. The annihilator of 
U is the set of all linear functionals on V such that f(v) = 0 for every v € U, and it 
is denoted by U°. 


Example 27 
Consider Example 24. Let V = F? and U = {0, u2}. Then U? = {0, 3}. 


Proposition 22.12 U° is a subhyperspace of V*. 


Proof U° #0, since f =0eU%. For any f,g €U°, (f—g)(x) = f(x) - 
g(x) = 0 for every x € U, and we get f — g C U’. Fora € F, we have (af)(x) = 
af (x) = 0 for every x € U. Hence, af € U°. 


Theorem 22.13 Let V be a hypervector space over the hyperfield F and U a sub- 
hyperspace of VY. Then dim(U) + dim(U°) = dim(V). 


Proof Let B = {s1, 52,--+ , 5m} be abasis of U. Then by Lemma 22.1, B is contained 
ina basis B = {51,82,°°+ 5 Sms Sm41s ++) Sn} Of V. Let {1, G2, --- , Pn} be the dual 
basis of V*. We prove that {@n+1, Q@m+2,°** » Pn} isabasis of U°. Clearly, g(s;) = 0 
for every | <i <mandm+1<k <n. Forany x € U, we have x € }° aj5;. So, 


i=1 
m 


g(x) € ¥ aip(s;) = 0 implies, g(x) = 0 for every m + 1 < k <n, which shows 


i=l 
that g, € U°. Trivially, {Qn41, @m+2, °°: » Pn} is linearly independent. Let f €¢ U°? C 
V*. By Theorem 22.11, we get 


fe fwe= VL foe+ YL fede = Yo fede. 
i=1 


i=1 i=m+1 i=m+1 


Therefore, f € span{@m+1, Pm42,°** > On}. Hence, {@n+1, @m+2.°°* + Pn} is a basis 
for U°. Furthermore, dim(U°) = n — m = dim(V) — dim(U). 


Theorem 22.14 Let U; and U2 be two subhyperspaces of a hypervector space V 
over the hyperfield F. Then U, = Uz if and only if U? = US. 


Proof IfU, = U2, thenU? = U?. Conversely, suppose that U? = U?. Onacontrary, 
assume that U; 4 U2. Then there exists w; € U; \ U2. Now we can find a linear 
functional g such that p(w) ~ 0 and y(v) = 0 for every v € U2. This gives g € U? 
but g ¢ U3, a contradiction. 


The following lemma is easy. 


Lemma 22.2 Let U; C U2 be two subhyperspaces of a hypervector space V. Then 
UZ Cc UP. 
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Theorem 22.15 Let U; and U2 be two subhyperspaces of a hypervector space V 
over F. Then, (U, + U2)? = UP NUS. 


Proof Since U;, Uz C U; + U2, by Lemma 22.2, we have (U; + U2)° € U?, US, 
which implies that (U; + U2)° C UP NUS. Let g € UP NU. Then g(v) = 0 for 
every v € U,, U2. Now for any x € U; + U2, we have v € w; + w2, and so g(x) € 
g(w;)+ ¢(w2) =0 implies g(x)=O0 for every xe U;+U>. Therefore, 
g €(U,+ U2)’. 


Definition 22.20 Let V be a hypervector space. A (weak) linear transformation 
P:V— VwithP? =P oP =P is called a (weak) projection on V. 


Theorem 22.16 Let V be a hypervector space and Ff be a projection on VY with 
Im(P) = M and kerP = N. ThenV=M@N. 


Proof It is obvious that M, N are subhyperspaces of V and M@ N CV. Let x € 
V. Then P(v —P(v)) = P(v) —P(P(v)) = P(v) — P?(v) = P(v) — P(v). That 
is, O€ P(v) —P(v) =P (v —P(v)). Therefore, there exists t € v —P(v) such 
that P(t) = 0. Now t € N and P(t) € M and v €t + P(v) CM +N. It follows 
that VC M+ N. Hence, V=M+N.Letu€e MON. Then u = P(x) for some 
x € Vand P(u) = 0. Now 0 = P(u) = P(P(x)) = P? (x) = P(x) = u. Therefore, 
u= 0. 


Theorem 22.17 Let V be a hypervector space and M, N be subhyperspaces of 
V such that VY = M @N. Then there exists a weak projection P on V such that 
Im(P) = M and kerP = N. 


Proof Since V = M @N, for each u € V we have u € w; + w2 for unique w; € 
M and w2 € N. Define P : V > V by P(u) = w. Clearly P is well defined. For 
any w, € M, we have w; € w; + 0 and so, P(w1) = wy). Similarly, for any w2 € 
N, w2 €0+ w2s0,P(w2) = 0. Also, fort € w, + w2, where w, € Mandw2 € N, 
we have P(t) = w; and P (P(t)) = P(w1) = w; = P(t). Therefore, P? = P. 
Leta € Fandu,w’ € V.Letu € w; + w2 andy’ € w) + ws forunique w;, w| ¢ M 
and w2, w, € N. Now,u+u' € (wi + w}) + (w2 + w5). So, 


Plu +u') CP ((wy + w}) + (w2 + w5)) 
=witu} 
=P(w1) +P(w}) 
=P(u)+ PW’). 


Infact foranyt € (w; + w}) + (w2 + w5),t € w+ wu’ suchthatw € w+ uw, CM 
and w' € w2+ws CN, we get P(t) =w and so P(w, + w}) + (w2 + wh) = 
w,t+w}. 

For a € F,u € V, we have u € w,; + w2 and so, P (au) = aw, = aP(w)). There- 
fore, P is a weak linear transformation and P? = P. 
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22.7 Conclusion 


We have proved the properties of subhyperspaces involving the notions such as direct 
sums and complements, and have given some examples and counter-examples. These 
results were the generalizations of respective aspects in classical vector spaces. We 
have established isomorphism theorems and annihilator properties of linear func- 
tionals in hypervector spaces. As future scope, one can study the hypernorms in 
hypervector spaces, and in similar structures. The authors Guillen and Harikrishnan 
[4], Harikrishnan et al. [5] have introduced various norms in different normed linear 
spaces, which can be generalized to hypervector spaces. Also, Joy and Thomas [7] 
have introduced the concept of lattice vector space; hence, we wish to extend these 
notions to the algebraic hyperstructures like hyperlattices as discussed by Pallavi 
et al. [9, 10]. 
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Chapter 23 M®) 
Generalized Lie Triple Derivations cro 
of Trivial Extension Algebras 
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23.1 Introduction 


Let A be an algebra (over a fixed commutative unital ring R) with center Z (A). A lin- 
ear mapping d : A — Ais said to be a derivation if d(uv) = d(u)v + ud(v) holds 
for all u, v € A; is said to be a Lie derivation if d({u, v]) = [d(u), v] + [u, d(v)] 
holds for all u,v € A; is said to be a Lie triple derivation if d([[u, v], w]) = 
[[d(u), v], w] + [[u, d(v)], w] + [[u, v], d(w)] holds for all u,v, w € A, where 
[u, v] = uv — vu is the usual Lie product. Lie (triple) derivations on various algebras 
have been studied by many authors over the past decades (see [3, 4, 11, 12, 15-20, 
22, 23)). 

The notion of derivation can be further extended to generalized derivation. A 
linear mapping Gy: A — A is called a generalized derivation if there exists a 
derivation d : A — A such that Gg(uv) = Gg(u)v + ud(v) for all u, v € A. Sim- 
ilarly, a linear mapping Gg: A — A is called a generalized Lie triple deriva- 
tion if there exists a Lie triple derivation d: A — A such that Gg([[u, v], w]) = 
[[Ga(u), v], w] + [[u, d(v)], w] + [[u, v], d(w)])) forallu, v, w € A. A generalized 
Lie triple derivation is called proper if it can be written as the sum of a generalized 
derivation and a linear mapping from the algebra into its center that vanishes on all 
second commutators. 

In the year 1991, BreSar [10] introduced the notion of a generalized derivation. 
Many results obtained earlier for derivations have been extended to generalized 
derivations (see [1, 2, 5—7, 9, 13, 14, 21] and references therein). In the year 2007, 
Hvala [14] introduced the notion of a generalized Lie derivation and proved that each 
generalized Lie derivation of a prime ring is the sum of a generalized derivation and 
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a central map which vanishes on all commutators. Motivated by the above result, 
Benkovié [7] investigated the problem of characterizing generalized Lie derivations 
of triangular algebras. Mokhtari et al. in [20] provided some conditions under which 
every Lie derivation of a trivial extension algebra is proper, that is, it can be expressed 
as the sum of a derivation and a central map which vanishes on all commutators. Ben- 
nis et al. [9] introduced the concept of Lie generalized derivation and extended the 
above result in the setting of Lie generalized derivations of trivial extension algebras. 
The first author together with Jabeen [5] introduced and studied generalized Lie triple 
derivations of triangular algebras. Recently, the authors [1] gave a description of gen- 
eralized Lie triple derivations of generalized matrix algebras. Of course, the class of 
generalized Lie triple derivations covers both the classes of Lie triple derivations and 
Lie generalized derivations. Therefore, studying generalized Lie triple derivations 
enables us to treat both important classes of Lie triple derivations and Lie generalized 
derivations simultaneously. 

The objective of this chapter is to investigate the problem of describing the form 
of generalized Lie triple derivations of trivial extension algebras. More precisely, we 
prove that under certain appropriate conditions every generalized Lie triple derivation 
of a trivial extension algebra is proper. Finally, as a consequence of our main results 
(Theorem 23.1 and Theorem 23.2), we describe the form of generalized Lie triple 
derivations of triangular algebras. 


23.2 Preliminaries 


From now on, we assume that A is a unital algebra and J an A-bimodule. 
Recall that a trivial extension algebra is a unital algebra A x F together with 
the pairwise addition, scalar multiplication, and the multiplication defined by 
(u, t))(v, t2) = (uv, utp + tv) for allu,v € A, t),t € J. Itis denoted by A x TJ. 
The center of A x J is given by 


LIAKT)={tu,t) | ue ZA), [v, tq] =0=[u,t'] forall ve A,t' €T}. 


In addition, if A has a nontrivial idempotent e; such that ete. = ¢ for allt Ee 7, 
where e7 = | — ej, then we see that 


L(A KT) = {lu,0) | we Z(A), [u,t] =O forall t eT}. (23.1) 


We now state some known results, which shall be used in proving the main results 
of this chapter. 


Lemma 23.1 ([8, Proposition 2.5]) Let A be an algebra with a nontrivial idem- 
potent e; and set e7 = 1 — e;. For an A-bimodule 7, the following statements are 
equivalent: 
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(i) For everyt €7, ete, =t. 
(ii) For everyt € 7, ext =O0= te. 
(iii) For everyt €7, e\t =t = te. 
(iv) Foreveryt €F andu é€ A, ut = ejue\t and tu = tenue. 


Note that if A contains a nontrivial idempotent e;, then it can be expressed as 
A = e,;Ae, + e} Aer + e2xnAe; + e2xAe2, where e;Ae; and e27Ae, are subalgebras 
of A with identity elements e; and e2, respectively, e;Ae2 is an (e;Ae,, e2xnAe2)- 
bimodule and e,Ae, is an (e2Ae2, e;Ae,)-bimodule. Hence each element u € A 
can be written as u = ejue; + e;uer + enue; + e2Ue2. We define two projection 
Maps Te,Ac, : AKT > Aand me,4c,: AKT > Aby Me, He, (u, t) = e;ue; and 
TeyAe,(U, t) = ener. 


Lemma 23.2 ([9, Lemma 3.11]) Let A be an algebra with a nontrivial idempotent 
e, such that e\tex = ¢t forallt € J. Then[Z(A), t] = Oforallt € 7, if one of the 
following statements holds: 


(i) L(eyAe,) = Me, He, (L(A & T)) and e;Aez is a faithful right e7Ae,-module. 
(ii) Z(ex,Aer) = Me,Ae,(L(A K T)) and e|Aez is a faithful left e; Ae, -module. 


The following lemma can be seen in [3] and [20]. 


Lemma 23.3 Letd: Ax ST > Ax TJ bea linear mapping. Then 6 is of the form 
d(u, t) = (fa(u) + ha(t), fru) + Ar (t)), (23.2) 


wherefg:A—> A;ha:T > A; fr :A—> T;andh; :F — F are linear map- 
pings. Moreover, 


e 6 is a Lie triple derivation if and only if 


(i) fa and fr are Lie triple derivations; 

(ii) ha((uy, v2], t)) = [[u1, v2], ha(t)] and ha((ui, t], u2]) = (lu, ha], v2] 
forallu,,u,€A,teT; 

(iii) hy ((u1, uo), t]) = [[fati), ua], t] + (lui, fa(u2)], t] + [lu1, ual, hy (t)] and 
hy ((uy, t], u2]) = [[fatuy), t], v2) + (lui, hg], ue) + (fur, t], far) 
for allu,,ux€A,meT; 

(iv) [[u, ha(ti)], 2) + (lu, ti], ha(te)] = 0 and [[ha(t), 2] +[h, ha(h)], ul = 
O forallu€ Aandt,t2 €T. 


e 6 is a derivation if and only if 


(a) fa and fz are derivations; 

(b) haut) = uhag(t) andha(tu) =ha(t)u forallue Aandt €T; 

(c) hr (ut) = faut + uhy(t) and hy (tu) = hr (t)u + tha(u) for all u € A and 
teT; 

(d) ha(t))to + tiha(tr) = 0 for all t},t2 ET. 
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Lemma 23.4 Suppose that there exists a nontrivial idempotent e, in A such that 
e\te2 =¢t forallt € J. Then the following statements hold: 


(i) IfX\:A— Ais a generalized derivation, then [A(e;uer) + A(eque,), t] = 0 
forallue Aandt eT. 

(ii) If Fa: A — Ais a generalized Lie triple derivation, then |Fa(e\uer), t] = 0 
and [Fa(e2ue;),t] =O forallue Aandt eT. 


Proof (i) Assume that A is a generalized derivation with associated derivation 
d. Then td(e,;) = 0 = d(e2)t for all t ¢ J. Indeed, using Lemma 23.1, td(e;) = 
t(d(e,)e; + e,d(e,)) = 0. Similarly, we can prove that d(e2)t = 0. Since d is a 
derivation, using Lemma 23.1, we have 


[d(e,uer) + d(e2ue,), t] = [d(e)eyuer + ej d(eyuer) + d(er)eque, + e2d(e2ue}), t] 
= ud(e2)t — tud(e,) = 0. 


Similarly, [A(ejwe.)+ A(eque;),t])=0 for all weA and teT. 
(ii) Assume that Fg is a generalized Lie triple derivation with associated Lie triple 
derivation fy. Then for all a € A, we see that 


Fa(euer) = Fa([[e1, e1uer], e2]) 
= [[Fa(es), e1mer], e2] + (Ler, fa(eruer)], ea] + (ler, ere], fa(er)] 
= [fa(er)eiuer — ejuerfa(es), 2] + leifaleiuer) — fa(eiuer)er, er] 
+[eiues, fa(er)] 
= fa(e)e;uer — euerfa(es er — erfaleieiuer + eifa(eiuer)er 


+eofa(euer)e, + euerta(er) — fa(er)eiuer. 


Hence, using Lemma 23.1, we have [Fa(ejue2), t] = 0 for allu ce AandteT. 
Similarly, [Fq(e2ue,),t] = Oforallu e AandteT. 


23.3 Generalized Lie Triple Derivations 


Inspired by BreSar’s idea, Bennis et al. [9] introduced the concept of a Lie general- 
ized derivation as follows: A linear mapping Gy : A — A is called a Lie general- 
ized derivation if there exists a Lie derivation d : A — A such that Gy({u, v]) = 
[Ga(u), v] + [u, d(v)]) for all u, v € A. Further, they showed that under certain 
restrictions every Lie generalized derivation of a trivial extension algebra is the sum 
of a generalized derivation and a central map which vanishes on all commutators. It 
is not difficult to verify that a Lie generalized derivation is a generalized Lie triple 
derivation but the converse is not true in general. For example, if 6 : A > AisaLie 
triple derivation but not necessarily a Lie derivation (note that such map exists, see 
[22, Counterexample 3.5]), then the map Gs;(x) = Ax + 6(x) for all x € A, where 
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A € Z(A), is a generalized Lie triple derivation but not necessarily a Lie generalized 
derivation. 

In this section, we extend the main results of [9] to the setting of generalized 
Lie triple derivations of trivial extension algebras. Let us begin our investigation 
with the following lemma which gives the structures of generalized derivations and 
generalized Lie triple derivations of trivial extension algebras. 


Lemma 23.5 Let G;: Ax T —~ Ax TF bea linear mapping. Then Gs is of the 


form 
Gs(u,t) = (Fat) + Halt), Fru) + Art), (23.3) 


where Fy: A> A; Ha: ST > A; F—-:A—->ZT and Hr :T — F are linear 
mappings. Moreover, 


e Gs; is a generalized Lie triple derivation with associated Lie triple derivation 5 
(as given in Lemma 23.3) if and only if 


(i) Fa, Fr are generalized Lie triple derivations with associated Lie triple deriva- 
tions fg, f7, respectively; 

(ii) Ha((u1, v2], t3]) = [[u1, v2], ha(ts)], Haun, t2],u3]) = (lu, ha(te)], us) 
and Ha((t1, U2], u3]) = [[Ha(t1), v2], 43] for all U,, U2, U3 cA and 


thebo,kBET;3 
(iii) Ay ((uy, v2], )) = ([LFa@y), v2], 6) + (1, fae)], 63) + (Iu, v2), 
hr (t3)], 
Ar ((u1, 2), u3]) = ([[Fau), 2), us) + (lu, he ()], us] + [[u1, tb], fa(us)] 
and 


T ([t1, v2], u3]) = [LAr (4), va), us) + Uh, fa(ue)), v3] + (1h, v2], faus)] 
for all uy, U2, u3 € Aand t,t, th ET; 


(iv) [lu1, ha(te)], 3) + (M1, 2], ha(ts)] = 9, [[Ha(t), v2], 3] + (lh, val, 
ha(ts)] 
= 0 and [[Ha(t), 6], v3) + lh, haG)], v3] = 0 for all uy, ux, u3 € A and 
tit,h€ T. 


e Gs is a generalized derivation with associated derivation 5 (as given in Lemma 


23.3) if and only if 


(a) Fa, Fr are generalized derivations with associated derivations fa, fy, respec- 
tively; 

(b) Ha(ut) = uha(t) and Ha(tu) = Ha(t)u forallu€ Aandt €T; 

(c) Hy(ut) = Fatu)t + uh (t) and Hr (tu) = Ay (t)hu+ tfa(u) for allueA 
andt eT; 

(d) Ha(ti)ta + tthalh) =O forallt),t€T7. 


Proof We only prove the assertion for generalized Lie triple derivation. The second 
assertion can be proved in a similar way. Let Gs be a generalized Lie triple derivation 
with associated Lie triple derivation 5 (as given in Lemma 23.3). Then 


Gs(IT1, To], T3l) = ([Gs(T1), Tal, T3] + (IT), 6(22)1, T3] + [T1, P21, 8(3)1 (23.4) 
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for all Tj, 75,7; €eAKT. 
Taking T; = (uw, 0), Tz = (uz, 0) and T; = (u3, 0), we have 


Gs(IIN, Ta], 73])) = (Fa([u1, v2], u3]), Pr (un, ua], us])) 
and 


[[G3(71), Tr], T3] + (M1, 6(72)], 73] + ([N, Tr], 6(73)] 
= (([Fa(u1), v2], us], Bi a 
+([[u1, uo], fatus)], (lu, v2], fr (us)) 
(([F'a(ui), v2], usa] + [[v1, fa(w2)], wa) + (lui, val, fatus)], 
[Fy (uy), v2), u3l) + [Lur, f(a) ], ws] + [a1 va], fr (3)))- 


Hence, it follows from (23.4) that Fg, F7 are generalized Lie triple derivations with 
associated Lie triple derivations fg, fy, respectively. 
Choosing T; = (uw, 0), Tz = (U2, 0) and T; = (0, f3), we get 
Gs(IN, Th], 73) = (Halli, v2], 3), Ay (1, v2], t3])) 


and 


[[Gs(11), Ta], T3] + [M1 6(72)], T3] + 11, 72], §(73)] 
= ([[1, v2), ha(s)), (Fa), v2], 6] + ([41, fav2)], 6) + (li, v2), he) ).- 


Thus, using (23.4), we get 
Ha((u1, v2], #3]) = [[u1, ua), hae): 
Ay (uy, v2], 3]) = [[Fa(ui), v2), 3] + (fur, fa(u2)], 63] + (lui, u2]), hz ()I. 
Similarly, we can obtain 
Aa((u1, t2], u3]) = [[u1, ha(tr)], sl: 
Ary ([u1, t2], u3]) = (LFa(ui), to], v3) + (Lui, Ag (2)], us] + [[e1, ta], fas) I; 
Ha([t, v2), u3]) = [LHa(t), v2], us]; 
Ar ([t), uo), u3]) = (LAz (t1), ua), u3] + [lui, fa2)], ws] + Uf, wal, fa(us)I. 
Considering T; = (w1, 0), T; = (0, t2) and T; = (0, t3) in (23.4), we obtain 


(0, 0) = (0, [[u1, ha(te)], 3] + [l1, t2], ha(s))). 
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Therefore, we have 
[lu, ha(te)], 3) + [le 2], halts)] = 0. 
Similarly, one can obtain 
[[Ha(ts), (u2)], 3] + [[h, va], ha(ts)] = 0; 
[LHa(ti), to], u3] + Uh, ha(tz)], v3] = 0. 
Conversely, suppose that G; is a linear mapping of the form (23.3) satisfying condi- 


tions (i) — (iv). Then it is easy to see that Gs is a generalized Lie triple derivation 
of A « F with associated Lie triple derivation 5. 


The first main result of the chapter states as follows: 


Theorem 23.1 Let A « J be a trivial extension algebra. Suppose there exists a 
nontrivial idempotent e, in A such that e\te, = t for allt € J . Then every general- 
ized Lie triple derivation Gs of the form (23.3) is proper if and only if the following 
statements hold: 


I. There exists a linear mapping €q : A — Z(A) such that 


(i) Fa — la is a generalized derivation, 
(ii) [€a(eiue,), t] = 0 = [Ca(eouer), t] =O forallue Aandt eT. 


2. Fr(eque,) = 0 and enHa(t)e; =O forallu ce Aandt eT. 


Proof Suppose that Gs is a generalized Lie triple derivation of the form (23.3) and 
there exists a linear mapping 0.4 : A — Z(A) such that the statements (1) and (2) 
hold. 

Let us define two mappings A,£: Ax T > Ax F by 


Alu, t) = (Fa — a)(u) + Halt), Fru) + Hr(t)) and €(u, t) = (Cau), 0). 


Then, A and @ are linear mappings and G; = A + €. Moreover, it follows from (23.1) 
and €g4(A) © Z(A) that L(A Kk J) C Z(A K J). Now, we prove A is a general- 
ized derivation through the following claims. 


Claim 1. Ha(ut) = uha(t), Ha(tu) = Ha(t)u and Ha(t))h + thha(tr) = 0. 


Using the assumption (2), we get Ha(t) = Ha([[t, e2], e2]) = [[Ha(t), e2], 
éo] = e,Ha(t)e. for all te 7. Then e;Ha(t)e; = 0 = e:Ha(t)e2. Moreover, 
Ha(t) = Ha([le1, t], e2]) = ler, ha], e2] = eiha(tyes + eoha(tyer. Again, 
using the fact that eoHa(t)e; = 0, we get exha(t)e; = 0 for all te 7. Thus, 
Ha(t) = e;ha(t)eo for all t € J. Now, we have Ha(ut) = e;ha(ut)er = uha(t). 
Similarly, we can show that Ha(tu) = Ha(t)u. Using Lemma 23.1, we get Ha(t))t 
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+ trha(h) = e,Haltieitr + tesha(tr)er = 0. 
Claim 2. F; is a generalized derivation. 


Let us define a linear mapping 77 as follows: T7(u) = Fr(e,ue1) + Fr (e2ue2) 
for all u € A. We assert that Y7 is an inner derivation. Since F7 is a gener- 
alized Lie triple derivation, we get 0 = F7([[e1we1, e2ueo], e2]) = Fr(eue)u t+ 
uf (e2Ue2) = —uF7(e2Uez) — fr (e,ue,)u. Taking u = e.ue, +e, in the above 
equation, we get F7(e2ue2) = f7(e2)u. Similarly if we take u = ejue; + e2, we 
get Fy(e,ue,;) = —ufy(e2). Hence 77(u) = —ufz (e2) + fr (e2)u. We now show 
that g7 defined by at» F7(e,ue2) is a generalized derivation. We have 


gr (uv) = Fr (e,uver) 
= Fr (e\ue ver) + Fr (e;ue2ve2) 
= Fr ([leiu, e1 veo], e2]) + Fr (leu, e2ver], e2]) 
= [[Fr(e1u), e1 ve], e2] + Lleiu, fr (e1ve2)], e2] + [Leiu, e1ver], fr (e2)] 
+I[F7 (e1u), e2ve2], e2] + [Leiu, fy (e2ve2)], e2] + [Leiu, cover], fr (e2)] 
= eafy (e, ver) + Fr (e,u)e2ve2 + e, uf (e2ve2) 
= uly (ever) + (Fr (e1ues) + Fr (e;uer))v + ufz (e2ve2) 
= Fe (e\uez)u + u(fr (e; vez) — fr (e2)v + fr (e2ve2)) 
= gr(u)u + u(fr (ver) — fr (e2)v). 


Since F7-(u) = T7(u) + g7(u), Fz is a generalized derivation. 


Claim — 3. Ay (ut) = (Fa — la) (u)t + uh (t) and Ay (tu) = Ar (thu + 
t(fa — Ca)(u). 


Using Lemma 23.1, Lemma 23.5 and the fact that Fg — €q is a generalized 
derivation with associated derivation fq — £4, we see that 


Ay (ut) = Az ([[eiuer, t], e2]) 
= [[Fa(eiue)), t], e2] + [leiuer, Ay (t)], e2] + [[eiues, t], fa(er)] 
= [Fa(eue,), t] + [erwei, hy (t)] + [ut, fa(er)] 
= (Fa — €a)(eiues), t] + [Caleiues), t] + leiues, hr (o)] 
= (Fa — la)(u)t + [Ca(ejue,), t] t+ uh (t). (23.5) 


Using the hypothesis (2), we get H;(ut) = (Fa — la)(u)t + uh (t). Similarly, 
one can prove that H; (tu) = Ay (t)u + t(fa — €a)(u). Thus, using Lemma 23.5 
and Claims 1-3, we conclude that A is a generalized derivation. 

Conversely, suppose that Gs; is proper, that is, G; = A + £, where A is a gen- 
eralized derivation and ¢ is a linear mapping from A x J into its center. Then, it 
follows from (23.1) that €(u, t) = (€a(u), 0) (lu, t) € A « J) for some linear map- 
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ping £4: A> Z(A) with [€q(u), t] = 0 for all wu € A and t € J. On the other 
hand, Gs; — € = A is a generalized derivation. Hence, by Lemma 23.5, Fg — la, 
F; are generalized derivations and Hg satisfies Ha(tu) = Ha(t)u for allu ce A 
andm ¢€ J. Therefore, using Lemma 23.1, one can easily prove that F7(e2ue,) = 0 
and e,Ha(t)e; = Oforallu €¢ AandteT. 


Applying Theorem 23.1, we now obtain the second main result of the chapter. 
Let W4 denote the smallest subalgebra of A containing all idempotents and second 
commutators. 


Theorem 23.2 Assume that A « JT is a trivial extension algebra and there exists a 
nontrivial idempotent e; in A such that e\te, = t for allt € J . Then every general- 
ized Lie triple derivation Gs of the form (23.3) is proper if the following statements 
hold: 


I. Every generalized Lie triple derivation of A is proper. 
2. Fr(e2ue,) = O and enHa(t)e; = Oforalluce Aandt eT. 
3. One of the following three conditions holds: 


(i) We, Ae, = €\;Ae, and W..,.Ae, = eA. 
(ii) Z(e,Ae\) = We, Ae, (L(A K F)) and e; Ae is a faithful right ex Ae -module. 
(iii) Z(ex,Aer) = We,Ae, (L(A «* FT )) and e; Ae? is a faithful left e, Ae, -module. 


Proof Let Gs be a generalized Lie triple derivation of the form (23.3). Since F'q 
is a generalized Lie triple derivation on A, by hypothesis (1), there exists a linear 
mapping €q : A — Z(A) vanishing on all second commutators of A such that F'q — 
fg is generalized derivation. It is sufficient to show that €q satisfies the condition 
(1) (ii) of Theorem 23.1, that is, [€.q(e,ue,), t] = 0 = [Ca (e2ue2), t] for allu ce A 
andteT. 

Set A’ = {e;ue,; € eyAe, | [€a(e;ue,),t] =0 forall t € 7}. First, we prove 
that A’ is subalgebra of e;Ae,. Note that A’ is R-submodule of e;Ae,, since fq is 
an R-linear mapping. To prove A’ is closed under multiplication, we establish the 
following identity: 


[ea(eiueve;), t] = [€a(eiues), vt] + [ea(eiver), ut] (23.6) 
for allu, v € A, t € J. From the expression (23.5), we have 
Ay (ut) = (Fa — La)u)t + [lalejue,), t] + uh (t). (23.7) 
Replacing u by uv in (23.7), we get 
Hy (uvt) = (Fa — la)(uv)t + [Ca(eiuve), t] + uvhr(t). (23.8) 


Using the fact that Fy — £ is a generalized derivation with associated derivation 
fa — €q and Eq. (23.7), we have 
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Hy (uvt) = (Fa — la)(u)vt + [a(eiues), vt] + uh (vt) 
= (Fa — la)(u)ut + [la(eiues), vt] + u(fa — Ca)(v)t + [Lalerver), ut] 
+uvh¢(t) 
= (Fa —la)(uv)t + [Ca(ejues), vt] + [Cq(e,ve,), ut] + uvhy(t). (23.9) 


Comparing (23.8) and (23.9), we obtain 
[Ca(ejuve,), t] = [Ca(ejue;), vt] + [Ca(ejve,), ut] forall u,vEeA,teT, 


which yields the desired expression (23.6). Hence A’ is subalgebra of e; Ae;. Taking 
u = v in (23.6), we get 


[la((e\ue\)*), t] = [€Ca(ejue,), 2ut] UE A,tEeT), (23.10) 
which implies that 
[Ca((erues)*), tf] = [la((eruer)*(e,ues)), ut] = [Ca(eiue), 3u7t}. (23.11) 


Suppose that e;we; € e;Ae, is an idempotent; that is, (e,ue,)* = eyuey. By (23.10) 
and (23.11), we arrive at [€q(e,we,), t] = [€q(3(e,ue,)* — 2(e,ue,)*), t] = 0, 
which implies that ejwe; € A’. 

Moreover, A’ contains all second commutators, since €.q vanishes on all second 
commutators. Now, if W.,ae, = e1Ae1, then A’ = e; Ae, that is, [Cq(eiues), t] = 
0 for every u € A, t € J.Inasimilar fashion, we can show that if W.,.4e, = e2xAer, 
then [£.q(e.uez), t] = O forall u € A, t € J. Furthermore, in view of Lemma 23.2, 
if either (3)(i7) or (3)(ii) holds, then [Z(A), t] = 0 for all t € 7, as required. 


23.4 Application to Triangular Algebras 


Let U, V be unital algebras and 7 a nonzero (UW, V)-bimodule. An algebra 


ut 


A=Tni(U,T,V) = a 


) | vetreT ve 


under the usual matrix-like operations is called a triangular algebra. Note that every 
triangular algebra Tri(U/, 7 , V) can be identified with the trivial extension algebra 
(U @ V) « F (see [3] for details). 

Furthermore, one can easily see that every generalized Lie triple derivation of 
(U @®V) « TF has the form 


G5((u, v), t) = (Fuevy, v), Fru, v) + Ar(t)). 
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That is, Haev(t) = 0 for all t € 7. Therefore, as consequences of Theorem 23.1 
and Theorem 23.2, we have the following results: 


Corollary 23.1 Let Gs be a generalized Lie triple derivation of a triangular algebra 
Tri(u, 7, V) of the form G3((u, v), t) = (Fu@vu, v), Fe (u, v) + Hy (t)). Then 
G; is proper if and only if there exists a linear mapping tyay : UV > ZU ® 
V) such that the following statements hold: 


(i) Fuav — luev is a generalized derivation, 
(ii) (lCyu@vu, v), t] = O forall forall (u,v) €eUSBVandt eT. 


Corollary 23.2 Every generalized Lie triple derivation of a triangular algebra 
Tri(U, J , V) is proper if the following two statements are satisfied: 


(i) Every generalized Lie triple derivation on U (resp. on V) is proper, 
(ii) Wy =U andWy = VV. 
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Chapter 24 ®) 
On Ideals of Compatible OT N-Group a 


Hamsa Nayak, Syam Prasad Kuncham, and Babushri Srinivas Kedukodi 


Mathematics Subject Classification (2010) 16Y30 


24.1 Introduction 


A neatring is a generalized ring with relaxed conditions that only one of the distribu- 
tive laws need to hold and the addition operation is not expected to be abelian. If 
(G, +) is a non-abelian group then the set of all mappings from G to itself forms a 
nearring (which is not a ring) under the usual operations of addition and composition 
of mappings. In 1984, Bhavanari [20] introduced an extension of nearring called a 
gamma neatring, wherein a set I’ of binary operations handles the multiplication 
between every pair of elements. This structure was further studied and developed 
by several authors including Booth [5], Booth and Groenewald [6], Kuncham [24]. 
Booth and Groenewald [6] introduced concepts such as equiprime gamma nearrings 
and contributed to the radical theory of gamma nearrings. Bhavanari and Kuncham 
[4] studied fuzzy cosets of gamma nearring. Booth, Groenewald and Veldsman [7], 
Veldsman [25] studied different types of prime ideals of nearrings such as com- 
pletely prime, 3-prime, and equiprime ideals and formulated corresponding radicals. 
Jagadeesha, Kedukodi, Kuncham [9] further related these concepts to lattice the- 
ory by defining interval valued L-fuzzy ideals of nearrings based on t-norms and 
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t-conorms. Jagadeesha, Kedukodi, Kuncham [13] studied homomorphic images of 
interval valued L-fuzzy ideals and proved isomorphism theorems. Bhavanari, Kun- 
cham, and Kedukodi [10] linked prime ideals to graph theoretic aspects of nearrings. 
Deepa and Anita Kumari [22] characterized rings for which the co-maximal graph is 
a line graph of some graph G and determined rings for which is Eulerian or Hamil- 
tonian. Amritha and Selvakumar [2] analyzed certain basic properties of k-maximal 
hypergraph of a commutative ring. N-Groups are modules over a nearring (Pilz 
[18]). Nayak, Kuncham, and Kedukodi [15] introduced a generalization of gamma 
nearring and N-Group called a ©1 N-Group. Nayak, Kuncham and Kedukodi [8, 
16] and Harikrishnan, Nayak, Vadiraja Bhatta, Kedukodi and Kuncham [17] further 
developed and discussed ©F N-Group. It is worth noting that a Ol N-Group even 
handles non-associative binary operations. This paper derives interesting properties 
of @F N-Group G when N is an additive subgroup of G. 


24.2 Preliminaries 


We refer to Pilz [18] and Koppula, Kedukodi and Kuncham [11, 12] for basic def- 
initions. For recent developments in nearrings, we refer to Kuncham, Kedukodi, 
Panackal and Bhavanari [19], Anshel and Clay [3], Rao and Veldsman [23] and 
Maxson and Meyer [14]. Computations in nearrings can be done using SONATA 
(Aichinger et al. [1]). 


Definition 24.1 Let (G, +) be a group with additive identity 0. G is said to be an 
N-Group if there exists a nearring (V,+,.), and mapping N x 0 x G > G (the 
image (n, g) € N x @ x G is denoted by n@g where @ is an operation), satisfying 
the following conditions: 


G) a+m)ég = nébg+mé@g and 
(ii) (nm)O0g = nO(m@g) 


forallg e Gandn,meN. 
We denote N-Group by yG 


Definition 24.2 (Bhavanari and Kuncham [21]) Let (M, +) be a group (not neces- 
sarily abelian). Let [ be a non-empty set. Then M is said to be a I’-nearring if 4 a 
mapping MI'M — M (denote the image of (7, 0), m2) bym p,m form,, m2 € M 
and p; € I’) with the following conditions: 


G) (m, + m2)p1m3 = m)p\m3 + m2pP\m3 and 
(ii) (m1 1mM2)p2m3 = mM) P\(mM2p2!M3) 


for all m,, m2,m3 € M and for all p), p2 ET. 
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Definition 24.3 (Nayak, Kuncham and Kedukodi [15]) Let (G, +g) bea group. G is 
called a OT N-Group if there exists a nearring (NV, +, -) and there exist maps O(N x 
Ox G— G),T(N xT x N > N) containing nearring multiplication-, Ap(N x 
Ar X G > G) satisfying the following conditions: 


G) 7+m)0g =nég+Gq még foralln,m € N, g € G,@ € © (Gis right distribu- 
tive), 

(ii) for every n,m e N,y €T, there exists 6, € Ap such that (nym)0g = nd, 
(m@g) for all g € G, 6 € © (@ Is quasi-associative). 


Definition 24.4 (Nayak, Kuncham and Kedukodi [15]) Let G be a OF N-Group. 
H C Giis said to be OT’ N-subgroup of G if NOH C H. 


24.3 Compatible OT N-Group 


Definition 24.5 Let (G, + G) be a group. G is called a compatible OT N-Group if 
there exists a nearring (NV, +, .) and there exist maps O(N x © x G— G), T(N x 
I x N — N) containing nearring multiplication -, Ar(N x Ar x N —> G) satis- 
fying the following conditions: 


G) NC Gand + and +g coincide in G (N is an additive subgroup of G), 
Gi) (n+m)0g = nOg + mé@g foralln,m € N, g € G,@ € © (6 is right distribu- 
tive), 
(iii) for every n,meN,y €T, there exists 6, € Ar such that (nym)@g = 
nd, (még) for all g € G, 6 € © (6 is quasi-associative). 


Definition 24.6 Let G be a compatible OT N-Group. A subset H of G is said to be 
a subgroup of G if NOH C H. 


Definition 24.7 Let H be a normal subgroup of a compatible 9 N-Group (G, +). 
Then H is called 


(i) a left ideal of G if nO(x +a) —nOx € A for allne N,x € G,aeH and 
6€O. 
(ii) a right ideal of G if a@x € H forallx ¢ Ganda e H. 
(iii) an ideal of G if H is both a left ideal and a right ideal. 


Note 24.1 A left ideal of compatible OP N-Group is a © N-ideal of OF N-Group. 


First, we provide some examples of compatible Ol N-Groups. 
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Example 1 


{0, 2, 4, 6}. CV, +) is defined as follows: 


{0, 1,2, 3, 4,5, 6, 7} and N 


Let G = 


+/0 2 4 6 


0/0 2 4 6 
2/2 4 60 
4}4 6 0 2 
6/6 0 2 4 


{0, 62} and Ar = {6,,, 6,,} as follows: 


Define T = {y1, 2}, © 


02 4 4 


ov 


oo 


owt 


oo 


02 4 6 


Y1 


on 


4100 0 0 
6;04 0 4 


Y2 


0/0 0 0 0 
210 6 4 2 
410 4 04 
6/0 2 4 6 


27 3456, 7, 


1 


0 


23 45 67 


1 


0 


2-30 A S56. 7. 


A 


0;0 000 0 0 0 0 
2/0 2 460 2 4 6 
4)/0 4040 4 0 4 
6};0 642 0 6 4 2 


02 


0;0 000 0 0 0 0 
2/0 2060 2 0 6 
4)/0 4040 4 0 4 
6};0 602 0 60 2 


0;0 00 00 0 0 0 
2/0 040 0 0 4 0 
4/0 00 00 0 0 0 
6/0 040 00 4 0 
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d,}/0 1 2 3 4 5 6 7 
0;0 00 00 0 0 0 
2/0 060 40 2 0 
4/0 040 0 0 4 0 
6/0 02 0 4 0 6 0 


Then G is a compatible OT N-Group. J = {0, 2, 4, 6} is an ideal of G. 


Example 2 


Consider Zg = {0, 1, 2,3, ..., 7}, the group of integers modulo 8, and a set X = 
{a, b}. 

Let G = {g | g: X > Zg and g(a) = O} and N = {g | g: X > Zy and g(a) = O}. 
Then G = {0, 21, 82, ---, 7}, where g; is defined by g;(a) = 0 and g;(b) =i for 
0<i<7andN = {g0, 21, 22, g3}, where g; is defined by g;(a) = 0 and g;(b) =i 
for0 <i <3. 

Define mappings fo, fi : Z4 — X by setting fo(i) = a for alli € Z4 and f\ (i) =a 
ifi 43, fi@ = bifi = 3. Write F = { fo, fi}. 

Define © = { fo, fz} where fo, f3 : Zg — X defined as fo(i) = a for alli € Zs and 
AB@ =aifi ¢ {3,7}, Ah@ = bifi € {3,7}. 

Now, let Ap = {6,4,, 5,,} where d,,, 67, : Zg > X by setting 5,,(i) = a foralli € Zg 
and 6,,(@i) = aifi ¢ {3, 7}, d,,@) = bifi € {3, 7}. 

Then G is a compatible @1 N-Group. J = {g0, g2, 94, go} is an ideal of G. 


Example 3 
Let (R, +) be the group. Take N = (R,+,-) anda,b,c ER. 


Define a divb= eee 

Corresponding to the operation div, define div, by a div,b = a div (bc’). 

Let © = {-, div}, T = {-, div} and Ar = {-, div,}. Clearly, the operations in © are 
right distributive. We have 


a div,(b div c) =a div ((b div c)c”) = a (a div b) dive. 


C2 


ot 


This implies div is quasi-associative. Hence the operations in © are quasi-associative. 
Therefore, R is a compatible OF R group. {0} and R are the only ideals of R. 
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Note 24.2 Every gamma nearring is a compatible OT N-Group (Take VN = G, 0 = 
Tl and Ar =T). 


Proposition 24.1 Let G be a compatible ©T N group. If IG then G/I = 
{g + I|g € G} is an compatible OY N-Group. 


Proof To show G/I is a compatible © N-Group, we first define operations +, 6 
on G/T as follows: 


(gi +1) + (g2 +1) = (814+ g2) +2, 
(n+ 1)0(gi + 1) = n0g, + I. 


It is easy to show that + is well-defined. We will prove that @ is well-defined. 
LetnOg,; +7 = nog, + TJ andx €n0g,+J].Thenx = nOg, +i = (n+i,)0(g1 + 
in) = (n+ i1)0(g, +i) =nOg, +i.HencenOg; + I C nog, + 1. Similarly, nOg, 
+17 Cn0g, + 7. Hencendg,; + 7 = nog, + I.Now, we see that forn,me N,y € 
I’, there exists 5, € Ar such that (nym)0g + I = nd,(mO(g + I)). 

Clearly @ is right distributive and quasi-associative. Hence G/J is a compatible 
OF N-Group. 


Definition 24.8 Let G be a compatible ©F N-Group. If J dG, then G/J = 
{g + 1|g € G} is called the factor group. 


Definition 24.9 Let N be anearring and G, G be compatible @F N-Groups. Then 
h: G > G is called a homomorphism if it satisfies 


G) h(x + y) = h(x) +hGQ) and 
(ii) h(nOg) = h(n)6h(g) for alln € N, x,y € Gandd € ©. 


Ker hh = {x € G|h(x) = 0}. 


Proposition 24.2. Let G and G be compatible @T N-Groups and f :G > G be 
a homomorphism. Then ker f is an ideal of G. Conversely, every ideal is a kernel 
of a homomorphism. 


Proof We have f (0) = 0. Hence 0 € ker f. Let g € G,n € N,a €ker f. Then 
f(a) =0.Now, 


f(gta—g)=f(g)+ f(a) — f(g) =0 >gta-—gecker f, 
f(n0(x +a) — nx) = f(nO(x +a)) — f(nOx) = f(nof(x +a) — f(njef (x) 
= f(n)o(f(x) + f(a) — fof (x) =0 > nO(x +a) — nOx Eker f. 


f(a6g) = f (a)6f(g) = 00f(g) =0 = adg Eker f. 
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Hence ker f is an ideal of G. To prove the converse, define ¢: G > G/I by 
o(g) = g+T. We prove that ¢ is well-defined and one-one. We have g; = g2 > 


gtl=agn+!1< (21) = $(g2). Letg +] € G/I. Then 6(g) = g+/. Hence 
¢ is onto. ¢ is a homomorphism because 


O(gi tg) =(gitg)tl=gitl+gt+ l= (ge) + (92), 
o(nOg) =nOg+1=(n+D6(g+1) = o(njod(g). 


Now, Ker @ = {x € Glg(x) =0+ J} = {x € Glg +1 =0+4+]} = I. Hence ideal 
is kernel of ahomomorphism. 


Theorem 24.1 Let G and G be compatible @ N-Groups, f :G— G a surjec- 
tion and K = kerf. Then K is an ideal of G and G/K =G. 


Proof Define @: G/K > G by ¢(a+ K) = f(a). We will show that @ is well- 
defined and one-one. Let a,b eG. We have a+K=b+K Sa-beKs 
fla—b) =0 @ f(a) — fb) =0 & f@ = fb) & 6+ K) = G+ K). 
Now, we prove that ¢ is onto. Let y € G . Asfis onto, y = f(a) forsomea € G.Now 
a+kK €G/Kand¢(a+ K) = f(a) = y. Now we prove that ¢ is homomorphism. 
We have 


o(at+K)+b+ kK) =o(a+b)+K) = fat+bh=f@+ fo) =oa+kK)+o0+k), 


and for any n € N,0 € ©, we have 


o(n+ K)O(at+ K)) = o(nOa+ K) = f(nOa) = fnoOf(a)=on4+ K)Ob(a+K). 


Thus G/K =G. 


Theorem 24.2 (J) Let G and G' be compatible @Y N-Groups, f :G—> G be 
homomorphism and H = ker f. If K’ is subgroup (resp. ideal) of G and K = {x € 
G: f(x) € K}= f-'(K ), then K is a subgroup(resp. ideal) of G by H © K and 


G/K=G/K. 
(2) Let H and K be ideals of compatible @Y N-Group G by H © K. Then 
G/K = (G/H)/(K/#). 


Proof (1) Define @¢: G > G'/K’ by $(x) = f(x) + K . First, we show that ¢ is 
a homomorphism. Let a,b € G. We have ¢(a +b) = f(a+b)+K =(f(a)+ 
f(b)) + K’. We claim (f(a) + f(b) + K =(f@) + K') + (f(b) + K). Note 
that x € (f(a) + f(b) + K’ > x = f@ + fO)+k = f@+0+ fH +k, 
sxe f@+tK+fO4K > (f@+ fO)tK CF@+K)+ (0) 

+ K ).Lety € (f(a) + K)+ (f(b) + K ). Hence there existk,,k, € K such that 
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y= f(a) +k, + f(b) +k. Then y = f(a) + fb)— fb) +k + fb)+k «€ 
[O+fOH+K +> f@+K)+(fO)+K)S F@ + fb) + K . Hence 
gia+b)=(flayt+ f(b) +K = = (f(a) +K)+ (f(b)+K)= b(a) + $(b). 
Now, ¢(nda)=f (nda) + K’ = (f(2) + 0)O(f@ + K) = (Fn) +k) @ + 
K ) = ¢(n)0¢(a). Now we prove that ¢ is onto. Let a+K eG {K., aeG. 
Since f is onto, there exists a € G such that f(a) = a. Hence g(a) = f(a) + K = 
a + K . By Theorem 24.1, we get G/ker g = G /K . Now, we have 


ker 6 = {xlo() =e + KJ ={x|f@)+K =e+ KJ =(e|f)eKJ=HK 


Letx € G,k € K,n € N andé € ©. As o(k) = 0, we get 
Patk—x)=$0)+¢O®)-$@)=0Sx+k—-xek; 
$ (kOg) = 6(k)09(g) = 006(k) =0 > kOg € K; $(nO(x +k) — nOx) = 
p(nd(x + k)) — b(nOx) = b(n)O(b(x) + (kK) — O(n)OG(x) = 0 
=> no(x+k)—noxe K. 

Hence K is subgroup (resp. ideal) of G and G/K =G'/K’. Let x € H. Then 
we have f(x) =e € K . This implies #(x) = f(x) + K =e +K =K. We get 
x €ker@=K.HenceH CK. 


(2) H is a normal subgroup of G and K is normal subgroup of G containing H. 
Hence K/H and G/H are factor ©T N-Groups. First, we prove that K/H is a 
normal subgroup of G/H. Define f : G/H > G/K by f(x+H)=x+4+K. We 
prove f is well-defined. We have x+ H=y+H >x-yeHsoax-yek 
=>x+K=y+K => f(x+H)= f(vy+ A). We now show that f is an onto 
homomorphism with ker f = K/H. We have 


fl@+H#)+ (y+ A) =fl@+y)+ A] 
=(x+y)+K 
=(x+K)+(y+K) 
=f(x+H)+ f(y+ H), and 

f(n0(x + H)) =f (nox + A) 
=nOx+ K 
=(n+0)0(x+ K) 
=f(n+0)6f(« + A) 
=f (n)Of (x + H). 


Let x + K € G/K. Then there exists x + H € G/H suchthat f(x+ H)=x4+K. 
We have 


ker f ={g+H|f(g+H)=e+K}={g+Alge K}=K/H. 
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Hence K/H is a normal subgroup of G/H. By Theorem 24.1, (G/H)/ker f 
G/K. Thus, (G/H)/(K/H) = G/K. 


~~ 
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Chapter 25 ®) 
The Range Column Sufficiency creek 
and the Pseudo-SSM Property of Linear 
Transformation on Euclidean Jordan 

Algebra 


Punit Kumar Yadav and K. Palpandi 


Mathematics Subject Classification (2010) 90C33 - 17C55 - 15A09 


25.1 Introduction 


Consider a matrix B = [bij]nxn, bij € IR and a vector g in IR"; the standard linear 
complementarity problem (LCP), denoted by LCP (B,q), is to find a vector z € R” 
such that 


zéR', Bz+qeR' and z’(Bz+q)=0. 


The LCP has many applications in many branches of optimization, economics, 
game theory, and marketing. The monograph Cottle et al. [2] is a typical reference for 
the standard LCP. In recent years, various authors have introduced different classes 
of matrices for studying the structure of solution sets to LCP; see, for example, 
Jeyaraman and Sivakumar [11], Pang and Venkateswaran [13], Cottle et al. [2], Tao 
[17]. One of the most important classes of matrices is strictly semimonotone matrix. 
We say a matrix B € R"”” is strictly semimonotone if 


z>0, z* Bz <0 = z=0, 
where the notation * is for the Hadamard product and z > 0 denotes that each com- 


ponent of z is in IR,. Such a class of matrix gives a unique solution to LCP on R4. 
(for example, Cottle et al. [2]). Motivated by this, a new class of matrices called the 
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strictly range semimonotone was introduced and studied in Jeyaraman and Sivaku- 
mar [11] by I. Jeyaraman and K. C. Sivakumar in the context of LCP. A matrix B is 
strictly range semimonotone if 


0<zeR(B), z* Bz<0 = z=0, 


where R(B) is the range of matrix B. Jeyaraman and Sivakumar [11] proved that 
the strictly range semimonotone matrix B gives the unique solution of LCP(B, q) 
on R(B) when q € R‘,. Due to the importance of the strictly range semimonotone 
matrix in LCP, it is generalized and studied for linear transformation on S” (the 
spaces of real symmetric matrices) and H” (the space of the Hermitian matrices) in 
Jeyaraman et al. [12]. 

Another important class of matrices, so-called column sufficient matrix, was intro- 
duced in Pang and Venkateswaran [13]. The matrix B € R”*” is called column suf- 
ficient if 

z*« Bz = 0 whenever z x Bz < 0. 


The column sufficient matrix gives equivalence relation with the convexity prop- 
erty of the collection of solutions of the corresponding LCP. Recently, Jeyaraman 
and Sivakumar [11] has extended the column sufficient matrix to the range column 
sufficient matrix. The matrix B € R”*” is the range column sufficient matrix if 


ze R(B), z* Bz <0 = zx Bz =0. 


Further, they have proved the convexity of the intersection of the solution set of LCP 
and R(B), if B is the range column sufficient. For the range column sufficient matrix 
B, they showed that the intersection of the solution set of LCP(B, g) and R(B) is a 
zero set for all g ¢ R_,. They proved that the strictly range semimonotone matrix 
and the range column sufficient matrix are equivalent on R’_ for a range monotone 
Z-matrix. 

Motivated by this, we study the pSSM and the rCS property over Euclidean Jordan 
algebra. We study the solution set to LCP when the underlying linear transformation 
has the pSSM property (rCS property), and we study the interconnections of the 
pSSM and the range cone column sufficiency (rCCS) property for some special 
transformations. 

Here is an outline of the paper: In the next section, we give some preliminaries and 
definitions useful for our discussion. In Sect. 25.3, the range column sufficiency (rCS) 
and the pSSM properties are introduced and studied in the context of Euclidean Jordan 
algebra. In Sect.25.4, we show an equivalence relation between the SSM property 
and the rCCS property for some special transformations. Section 25.5 characterizes 
the rCS and the pSSM property for the relaxation transformation. 
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25.2 Preliminaries 


Throughout the paper, we use the following notations: 


(i) The IR” denotes the Euclidean n-space and R’_ (R’_,) denotes the nonnegative 

orthant (positive orthant). 

(ii) The set of all complex (real) matrices of order n is denoted by C”*” (R”™"). 

(iii) For B € R"*”, B* denotes the conjugate transpose of B. If B is positive semidef- 
inite (positive definite), then we write B > 0 (> 0). 

(iv) The spectrum of matrix M and the spectral radius of M are denoted by o(M) 
and p(M), respectively. 

(v) We use [n] to denote the set {1, 2, ..., m}. 


25.2.1 Euclidean Jordan Algebra 


We state some basic concepts, structure, and results of Euclidean Jordan algebra 
(EJA) from Faraut and Korainy [3] and Gowda et al. [9]. 


(i) Let (V, 0, (-,-)) be a Euclidean Jordan algebra. Here ’o’ is a given Jordan 
product. We say an element e € Vis a unit element if x oe = x forallx eV. 
We define K = {x ox : x € V} as the symmetric cone in V, and K is a self- 
dual closed convex cone. For x € ‘V, we say that 


x € K (K°)if and only if x > O0(> 0). 


(ii) A Jordan frame {e;}"_, is a finite set of primitive idempotent in V which 
satisfies 


e,oe; =Oifi A jand ) e =e. 


i=1 


(iii) Given any x € ‘V, there exists a Jordan frame {e;}/_, such that 


7 
x= > QAjiej, 
i=l 
where a; € R. This expression is called the spectral decomposition of x and 


the real numbers qa; are called eigenvalues of x € V. This can be rewritten as 


x =xt—x7 with (xt,x7) =0, 


r r 
wherext = y a 2),x%- = y a; e;,a;' := max{0,a;}anda,; := a) —a;. 
t=1 


i=1 
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(iv) 


(v 


~a 


(vi) 
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Fix a € V; we define a linear transformation L, in V as 
Lav) =aov, 


for all v € V. Any two elements u, v € V operator commute if L,,Ly = LyLy. 
From the Lemma X.2.2 in Faraut and Korainy [3], we say that x and y operators 
commute if and only if x and y have their spectral decomposition with respect 
to acommon Jordan frame. By a proposition in Gowda et al. [9], we observe 
that (x, y) = 0 if and only if x o y=0 for x, y € K. In each case x and y 
operators commute. 

For a fixed Jordan frame {e;}/_,; in V and | <i, j < r, we write eigenspaces 
of V as 

Vii= {x €V:x0e,=x}=Re,, 


when i 4 j, 
1 
Vij = {x eVixoeg= at = ee ;}. 
By theorem IV.2.1 in Faraut and Korainy [3], for given any Jordan frame {e;}/_,, 
we write for any x € V as 


x= ) Xj@i + y Xijs 
i=l 


i<j 


where x; is real number and x;; € Vj;. This expression is called the Peirce 
decomposition. 
Analgebra automorphism A on ‘V is a invertible linear transformation which 
satisfies 

A(xo y) = A(x) 0 A(y) forall x, y EV. 


We denote Aut(‘V) as the set of all algebra automorphism of V. We say A is 
an orthogonal automorphism if AA? (x) = A’ A(x) = x forall x € V. 


25.2.2. The Linear Complementarity Concepts 


Given a transformation £on V anda vector g € V, the LCP(L, qg) is to findx € V 
such that 


xEK, y= L(x) +q €K and (x,y) =0. 


If such x exists then x is called a solution to LCP(£, q), and we denote SOL(L£, q) 
as the set of all solutions to LCP(L, q). For V = IR" and K = R’‘_, the LCP becomes 
the standard LCP. If V = S" and K = S",, then LCP is called the semidefinite LCP 
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(SDLCP); see (Gowda and Parthasarathy [4], Gowda and Song [6], Tao [17]). Here 
are some properties of LCP. 


Definition 25.1 Let £ be linear on V. Then we say the £ has the 


(1) monotone property if (x, £(x)) > 0, for allu € V. 
(ii) P property if 


{x operator commutes with £(x) and x o L(x) < 0} > x = 0. 
(iii) strict semimonotone (SSM) property if 
{x € K, x operator commutes with £(x) and xo L(x) < O} S> x = 0. 
(iv) column sufficiency property if 


{x operator commutes with L(x) andx o L(x) < 0} > x0 L(x) = 0. 


25.2.3 Z Property 


We now recall the Z property of linear transformation from Gowda and Tao [8]. A 
linear transformation £L: V — V has the Z property on K, if 


x,y €K and (x,y) =0 => (L(x), y) <0. 
In short, we write it as £L € Z. Here are some examples of Z-transformation. 


Example 1 


All the matrices with non-positive off diagonal entries are Z-matrices 
concerning R’ . 


Example 2 


Let V = H" and K = H{, (the cone of positive semidefinite Hermitian matrices). 
Then for A € C”*”, the Lyapunov transformation L4(X) = AX + XA%* and the 
Stein transformation S4(X) = X — AX A* are Z-transformations. 


We recall a well-known result from Schneider and Vidyasagar [16], which will 
be used later. 
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Theorem 25.1 Let £€ Z. Then{Re(u) : w € o(L)} € o(L) with acorresponding 
eigenvector in K. Here Re({1) is the real part of jw. 


25.2.4 Group Inverse 


We say a matrix A € C”*” has a group inverse if there exists a unique matrix Y € 
C”*" which satisfies A= AYA, Y= YAY, AY = YA. It is denoted by A*. The 
group inverse of A is the same as the inverse of matrix A when A is invertible. For 
more details, refer to the book of Ben-Israel and Greville [1]. The group inverse 
for a linear transformation has been discussed by Robert [15]. The group inverse 
of a linear transformation £ on ‘V is a linear transformation T on ‘V that satisfies 
TLT =T, LIL=L, LT =TEL. It is denoted by L*. We recall the following 
characterizations and definitions from Robert [15], which will be used later. 


(Tl) The L£* exists if and only if V = R(L) @ N(L), ie., V is the direct sum of 
range space and null space of £. Here, R(L) and N(L) denote the range space 
and null space of L£, respectively. 

(T2) The R(L), R(L*), and range of £L* are the same. 

(T3) If L* exists and x € R(L), then LL* (x) = x. 


We end this section with an easy observation that the group inverse exists for any 
self-adjoint transformation. 


25.3 The Range Column Sufficiency Property 
and the pSSM Property 


In this section, we extended the range column sufficient and the pSSM property on 
Euclidean Jordan algebra. 


Definition 25.2 Let £ be linear on V. Then we say L£ has the 


(i) range column sufficiency (rCS) property if 


{x € R(L), x operator commutes with L(x) andx o L(x) < 0} > xo L(x) = 0. 


(ii) range cone column sufficiency (rCCS) property if 


{0 < x € R(L), x operator commutes with L(x) andx o L(x) < O} > x0 L(x) = 0. 
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Proposition 25.1 Let £ be linear on V. Then, 


(i) If the group inverse of £ exists, then L£ has the rCS property if and only if the 
group inverse of £ has the rCS property. 

(ti) L has the rCS property if and only ifL := ATLA has the rCS property for all 
the orthogonal automorphism A. 


Proof (i). Assume £ has the rCS property. Let x € R(L*) such that x operator 
commutes with £L*(x) and x o L*(x) < 0. Let z := L(x). Asx € R(L*), then from 
(T2) and (T3) we have L£(z) = x. Since x operator commutes with L*(x) and x o 
L* (x) < 0, z and L£(z) operator commute with each other z o £(z) < 0. From the 
rCS property of L, we get zo £L(z) = 0 which implies that x o L*(x) = 0. Hence 
L* has the rCS property. By using a similar argument, the reverse implication can 
be proved. : , 

We now prove (ii). Let x € R(£) such that x operator commutes with £(x) and 
xo L(x) < 0. Then there exists a common Jordan frame {e;}/_, such that 


x= yo rie, L(x) = Ye pies and A;u; < OVi € [r]. 


i=l i=i 


Our aim is to prove that A; 4; = Oforalli € [r]. Asx € R(L), then we can find z eV: 
x = A’ £A(z) and also A is an orthogonal automorphism which gives that Ax = 
L£(Az) and hence A(x) € R(L). By Pope nen 4.2 in Gowda and Sznajder [7], 
{A (e;) }i_ 1 is a Jordan frame in ‘V and (A7)7! (ei = = A(e;) foralli € [r]. So we have 


A(x) = a A(e;) € R(L) and L(A(x)) = ac Here, A(x) and L(A (x)) 
i=1 


have their anetal decomposition with respect to a common Jordan frame. Hence, 
r 


A(x) operator commutes with £(A(x)) and A(x) o L(A(x)) = > Ai miA(e;). AS 


i=l 
AiMi <0 for all i € [r], A(x) 0 L(A(x)) <0. Since £ has the rCS property, we 
have A; (4; = O for alli € [r]. Hence £ has the rCS property. The reverse implication 
follows immediately by choosing A as the identity transformation. 


Definition 25.3 Let £ be linear on V and S be a subset of V. Then we say £ has 
the range cross commutative property on S if for any g € S, x; operator commutes 
with y2 and x2 operator commutes with y;, where x;, x2 € SOL(L,q) NM R(L) and 
y= L(x) +q, i= 1,2. 


Next, we give a sufficient condition for the convexity of the SOL(L, q)N R(L). 


Theorem 25.2 Let L£ be linear on V. If £ has the rCS property and range cross 
commutative property on V, then the following are true. 


(i) SOL(L,q)N R(L) is a convex set for allq € V. 
(ii) SOL(L, gq) R(L) = {0} for all q > 0. 
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Proof (i): Let x1, x. € SOL(L,q) NM R(L). As R(L) is a subspace, we have tx; + 
(1 —t)x2 € R(L£) for all 0 < ¢ < 1. Note that for all 0 <¢ < 1, 


txy + (1 —t)x2 > Oand L(tx,; + 1 —t)x2) +4 = 0. 


Let y; :-= L(x) + gq. As x1, yo > 0, by the range cross commutative property on 
V, x; operator commutes with y2 and x; o y2 > 0. Similarly, x2 o y; > 0. There- 
fore, (x, — x2) operator commutes with L(x; — x2) and (x; — x2) 0 (L(x, — x2)) = 
(—x] 0 y2 — X20 yy) < 0. Since x1, x2 € R(L) then x; — x2 € R(L) and from the 
rCS property of £, we have (x, — x2) o L(x, — x2) = 0 which implies that 


x10 y2=0 and x20 y; = 0. (25.1) 
From Eq. (25.1), we have 
(tx) + 1 — t)x2) 0 (L(x + = 1x2) +g) = tC —1)[%1 0 yn + x20 yi] = O. 


Therefore, tx; + (1 —t)xo € SOL(L, q)N R(L) for all 0 < t < 1. This proves 
item (i). 

We now prove (ii): Let g > O withO 4x € SOL(L,q) N R(L). Asg >0,0€ 
SOL(L, gq) R(L) and by item (i), tx € SOL(L, gq) N R(L) for allO < t < 1. So 
we have x o (£L(tx) + gq) = 0 for allO < t < 1. Ast > 0, x 0 g = 0 which implies 
x = 0 as q > 0. Thus we get a contradiction. Hence we have our claim. 


The following example illustrates that we cannot drop the range cross commutative 
property in the above theorem to show that SOL(L, g) N R(L£) is equal to 0 whenever 
q> 0. 


Example 3 
Let A = =hy and X = |*! **] € S?. Then 
—33 X2 X3 
_ —2x, + 6x. 24x24 3x3 -— 3x) 
ba) Ie +3x3-—3x, —6x. + 6x3 : 


Here we can see easily that A is a positive stable matrix. Hence from a result from 
Gowda and Song [6], L4 has the P property. L 4 has the rCS property. 


23 00 
Assume Q = 3 5 € S?_,.Then Z ='ll5 0| € SOL(La, Q) NM R(La). Now let 
_ {10 9 _ 15/8 3/8 2 _ ; 
us take Y = lo i € S4 and for U = Be aA € S*, we get Y = La(U) which 


00 2 
05 € S* and 


Y(L4(Y) + Q) = 0. Therefore, we can see that Z, Y € SOL(L4, Q)N R(La) and 


implies Y € R(L,). By an easy computation we get L4(Y) + Q = 
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Y(L4(Z) + Q) 4 Z(La(Y) + Q). So La does not have the range cross commuta- 
tive property and SOL(L4, Q)M R(La) 4 {0} for Q > 0. 


Next, we extend the pSSM property on Euclidean Jordan algebra. 


Definition 25.4 Let £ be linear on V then £ has the pseudo strict semimonotone 
(pSSM) property if 


{x operator commutes with L(x), 0 < x € R(L)andx o L(x) < 0} S x = 0. 


In the next theorem, we give an equivalence relation for the intersection of the 
solution set of the LCP and the range of linear transformation with the pSSM property. 


Theorem 25.3 Let £ be linear on V. Then the following are equivalent: 


(i) £ has the pSSM and the range cross commutative property on K. 
(ii) SOL(L, q)N R(L) = {0} for allq €K. 


Proof (i) => (ii): Let g € K and x € SOL(L,q)N R(L). Then x operator com- 
mutes with L(x) + q and x0 (L(x) +4) =0 so xo L(x) = —xoq. As OE 
SOL(L£,q) AM R(£) and by the range cross commutative property of £ on K, x 
operator commutes with g and hence x o g > 0 which implies x o L(x) < 0. Since 
£ has the pSSM property, we have x = 0. Hence SOL(L£, gq) N R(L) = {0}. 

(11) = > (i): Suppose SOL(L, gq) N R(L) = {0} for all g € K. We can easily check 
that £ has the range cross commutative property on K. Suppose there is a vec- 
tor x € R(L)NK such that x and L(x) operator commute and x o L(x) < 0. 
This gives that x and £(x)t operator commute and xo £(x)t = 0. Now set 
gq = —L(x) + L(x), then g = L(x)” and hence g > 0. As x o L(x)* = 0 and 
L(x) +g = L(x) + L(x)” = L(x)* = O, we get x0 (L(x) +g) = x0 L(x)* = 
0. So we have x € SOL(L,q) A R(L). Then x = 0. Hence L£ has the pSSM 
property. 


We say a matrix is semipositive stable (positive stable) if real parts of all the 
eigenvalues of the matrix are nonnegative (positive). Now we discuss a spectral 
property of the range cone column sufficient Z-transformation. 


Theorem 25.4 Let £e Z. If £has the rCCS property, then L is semipositive stable. 


Proof Assume £ has the rCCS property. As £ is a Z-transformation then from 
Theorem 25.1, there will be a non-zero vector x € K such that L(x) = minx, 
where Mmin = min{Re(w) : w € o(L)}. It is enough to show that min is nonneg- 
ative. Assume the contrary, 1.€., min < 0. Then 


xo L(x) = Or <0. 


As x € R(£) and from the hypothesis, x = 0, which contradicts the fact that x 4 0. 
SO [min = 0. Hence proved. 
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Remark 25.1 In this example, we can see that for Z-transformation, the converse of 


0-1-2 
the above theorem is not true. Let-£= |0 0 —1]:R? > R¢andx = (1,1,0)7 € 
00 0 


IR>. For y = (0, 1, —1)7, we get L(y) = x sox € R(L). Here, all the eigenvalues of 
£ are zero so £ is semipositive stable and off diagonal entries of £ are non-positive 
then £ is a Z-transformation on RS and 


O<xeR(Landx*«L(x)=] 0 | <0. 
0 


But x * L(x) 4 0. So £ is not range cone column sufficient. 


25.4 Equivalence of rCCS and pSSM Properties for Some 
Special Transformations 


In matrix theory, a matrix A € R"*” is a range monotone matrix if Ax > 0, x € 
R(A) implies x > 0. Various characterizations have been given for range monotone 
matrices; see Peris and Subiza [14]. For a range monotone Z-matrix, the strictly 
range semimonotone is equivalent to the range column sufficient on IR‘; see in 
Jeyaraman and Sivakumar [11]. A natural question arises whether this relation will 
hold for linear transformation in general or not. This will be answered in the present 
section. To proceed further, first, we define the range monotone property for linear 
transformation. 


Definition 25.5 Let £ be linear on V then £ has range monotone property if 
L(x) eK, x € R(L) = x EX. 


We say here £ is range monotone transformation. 


Theorem 25.5 Let £ be a range monotone Z-transformation on V. Then the fol- 
lowing are equivalent. 


I. £ has the pSSM property. 
2. L£ has the rCCS property. 


Proof (i) => (ii) is obvious. 
(ii) = > (i): Let O < x € R(L) and x operator commute with L£(x) such that 
x o £(x) < 0. Then there exists a Jordan frame {e;}*_, such that 


x=) aje;, L(x) = >> fie and af; <0, i € [r]. 


i=1 i=1 
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From the rCCS-property of £, x o £(x) = Oand therefore we have a; 6; = 0, i € [r]. 
If x > 0, then L(x) = 0 and hence L(—x) = 0. As L(—x) € K and —x € R(L), 
then from the range monotone property of £, —x € K which implies x = 0. Thus 
it gives us a contradiction. Therefore, x must have a zero eigenvalue. We are done 
if all the eigenvalues of x are zero. If not, that is, x contains a non-zero eigenvalue. 
Without loss of generality, we suppose a; > 0 for all 1 <i < k anda; = 0 for all 
k +i € [r]. Note that 6; = Oforall 1 <i <k. Asa; = Oforallk +i € [r], we have 
(x, e;) = 0 for all k +7 € [r], which implies ((x), e;) < Oforallk+1<i<r,as 
£ is Z-transformation. So 6; < 0 for all k+1<i <r. Hence we get L(x) < 0. 
As £(—x) € ® and —x € R(L), therefore from range monotone property of £, we 
have —x € K which implies x = 0. It gives a contradiction. Hence £ has the pSSM 


property. 


Next, we show the equivalence of the pSSM property, the range column sufficiency, 
and the semipositive stability for the Lyapunov transformation L 4. 


Theorem 25.6 Let A € C”*”. If L*, exists, the following are equivalent: 


(i) A is semipositive stable. 
(ii) La has the pSSM property. 
(iii) La has the rCCS property. 
(iv) La is semipositive stable. 


Proof (i) => (ii): If A is semipositive stable then from Theorem 3.11 in Jeyaraman 
et al. [12], L4 has the pSSM property. 

(11) ==> (iii): By definition, the pSSM property implies the rCCS property. 

(iii) => (iv): It follows from Theorem 25.4. 

(iv) => (i): Let Av = Av. Then we have 


La(vv*) = Avu* + vv*A* = (A+ A)vv* = 2Re(A)uv*. 


As La is semipositive stable so Re(A) > 0. Hence A is semipositive stable. 


In the following theorem, we show equivalence between the pSSM property and 
rCCS property for S4. 


Theorem 25.7 Let A € C"*" such that S*, exists. Then the following are equivalent: 


I. p(A) < Lice, all the eigenvalues of A lie in the closed unit disk. 
2. Sa has the pSSM property. 

3. S,4 has the rCCS property. 

4. Sy, is semipositive stable. 


Proof (i) => (ii): From Theorem 3.16 in Jeyaraman et al. [12], S4 has the pSSM 
property. 

(ii) = > (iii): Follows from definition. 

(iii) = > (iv): By Theorem 25.4, S,4 is semipositive stable. 
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(iv) = > (i): Suppose there exist 4 € o(A) and non-zero vector u such that Au = 
Au, and |A| > 1. Now 


S4(uu*) = uu* — Auu* A* = (1 — |A)?)uu*. 


Then |A| > 1 implies (1 — |A|?) < 0. So we have a contradiction of our assumption. 
Hence p(A) < 1. This completes the proof. 


Let B, Cy, Co, ..., Cy be in C’*”. We define a linear transformation T on H" by 


k 
T(X) = BXB* - > C)XC%. 


i=1 


This class of operators has been studied in the context of positive operators in Hans 
[10]. When B = A+ /,k = 2,C; = A, Co = TI, the operator T reduces to the Lya- 
punov transformation and for B = J,k = 1, Cj = A, we get the Stein transforma- 
tion. From Theorems 25.6 and 25.7, the pSSM property and the rCCS property are 
equivalent for the Lyapunov and the Stein transformation. Motivated by this, one can 
ask whether the same result holds for the operator 7. We now study a particular case 
of T. By considering B = A+ /,k =3,C; = 1,C, = A, C3 = A in T, we get 


Lity = AX XA? — AXA, 


We prove an equivalence relation between the pSSM property and the rCCS property, 
for L(X) = AX + XA* — AXA*. 


Theorem 25.8 Let L(X) = AX + XA* — AXA*. If L* exists, then the following 
are equivalent: 


(i) L has the pSSM property. 
(ii) L has the rCCS property. 


Proof (i) => (ii): It is obvious. 
(ii) = > (i): Suppose 0 4 X € R(L) such that X > 0 and XL(X) = L(X)X < 0. 
Since XL(X) = L(X)X, there will be a unitary matrix U,ie., U*U =UU* = 17, 
such that 

D=U*XU and E=U*L(X)U, 


where D is a diagonal matrix with nonnegative entries and EF is a diagonal matrix with 
ED ~< 0. As we have X > 0 and XL(X) = L(X)X ~ 0, so by the rCCS property 
of L, XL(X) = 0 which implies ED = 0. 

Case I: Suppose X is invertible and X € R(L). Then L(X) = 0 which gives us X = 0 
as L*L(X) = X. Thus we get a contradiction. 

Case II: Suppose X is not invertible. Then 
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B, By 


where D, > 0. Let U*AU = Be By 


U*L(X)U, we get 


| where B, compatible with D,. As E = 


B,D, + D, Bt — B,D, B* D, Bz — B,D, Bs] _ [0 0 
B3D, — B3D,B* —B3D, Be = 0B; |" 


This gives that 
BD, + D, By = B,D, BF = 0, B3D, B3 = -—E, and B;D, = B3D, By = 0. 


We will prove here that J — By is invertible. On the contrary, let us assume that v is 
a non-zero vector such that (J — B¥)v = 0. This implies Biv = v, so 


((B, D, + D,By = B,D, By)v, v) =0 => (Dv, v) = 0. 


It contradicts the fact that D; > 0.Hence J — By isinvertible. As B3D, — B3D, By = 
B3D,( — By) = 0, this implies B; = 0. So E; = 0 which gives that E = 0. There- 
fore, we get L(X) = 0 which implies that X = 0 from (T3). Thus we get a contra- 
diction. Hence L has the pSSM property. 


In Theorem 25.6, we proved the equivalence for the rCCS and the pSSM property for 
the Lyapunov transformation. Motivated by this, we prove the same relation for the 
Lyapunov-like transformation. A linear transformation £ on ‘V is called Lyapunov- 
like if 

x,y €Kand (x,y) =0 => (L(x), y) =0. 


Theorem 25.9 Let L be a Lyapunov-like transformation. If L* exists, then the fol- 
lowing are equivalent: 


(i) £ has the pSSM property. 
(ii) L£ has the rCCS property. 


Proof (i) => (ii): This is obvious. 

(ii) = > (i): Let 0 <x € R(L) and x operator commute with £(x) such that 
xo L(x) <0. As £ has the rCCS property, x o £(x) = 0 and from this we get 
(x, £(x)) = 0.Nowx > Oandx o L(x) = x 0 (L(x)* — L(x)~) = Owhichimplies 
xo L(x)t = xo L(x)~ = 0. From this we get (x, £(x)*) = (x, L(x)~) = 0. Here 
x >0, L(x)t > 0 and (x, L(x)*) = 0. As L£ is a Lyapunov-like transformation, 
then we have 


(L(x), Lx)*) = 0. 


Similarly (L(x), £(x)~) =0. Therefore, (L(x), £(x)) =O which implies 
| £(x) ||? = 0. Hence L(x) = 0. Asx € R(L) and L* exists, x = 0, as L*L(x) = x. 
Thus we have our claim. 
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Proposition 25.2 Let L£ be linear on V with (x, £(x)) > 0 for all x € R(L). Then 
£ has the rCS property. 


Proof Letx € R(L£). Assume x operator commutes with £(x) such that x o L(x) < 
0. This implies (x, £(x)) < 0. Therefore, from our hypothesis (x, £(x)) = 0. By 
the operator commutativity of x and (x), we get x o L(x) = 0. Thus L£ has the rCS 
property. 
Theorem 25.10 Let L£ be linear on such that L exists. If ||L£\| < 1, then L'L — L 
has the rCS property. 


Proof Let u be in R(L). Then £L*(u) = u, which implies that 


(u, (LL* — Lu) = (u, LL*(u)) — (u, Lu) = (u,v) — (u, LW) = A= |LDIlull? = 0. 


By Proposition 25.2, £ has the rCS property. 


Theorem 25.11 Let £ be a self-adjoint Z-transformation. Then the following are 
equivalent: 


(i) Lis monotone. 

(ii) L£ has the column sufficiency property. 
(iii) £ has the rCS property. 
(iv) L£ has the rCCS property. 

(v) L£ has the pSSM property. 


Proof (i) => (ii): From Theorem 6.1 in Tao and I. Jeyaraman [18], it holds. 
(ii) => (ii) and (ii) => (iv) follow from the definition. 

(iv) = > (v): Let x € R(L) NK such that x operator commutes with £(x) and x o 
L£(x) < 0. From the rCCS property of £, x o L(x) = O and this implies (x, L(x)) = 
0. Since £ is a self-adjoint Z-transformation, by Theorem 25.4, all the eigenvalues 
of L£ are real and nonnegative, giving that L(x) = 0. As L£ is self-adjoint, L* exists. 
Since x € R(L), then £L* L(x) = x. This gives that x = 0. 

(v) => (i): Let £ has the pSSM property and £ be a self-adjoint Z-transformation. 
So by Lemma 3.5 in Jeyaraman et al. [12], all the eigenvalues of £ are semipositive 
stable. As L£ is self-adjoint, all the eigenvalues of £ are real and nonnegative. Hence 
£ is monotone. 


25.5 The Relaxation Transformation 


Given a fixed Jordan frame in V, we define the relaxation transformation as follows: 
For A € R’*’ and for any x € ‘VV, the Peirce decomposition 


r 
x= ) xje; + ) Xij 
i=l 


i<j 
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then we write the relaxation transformation as 


Ra(x) = > vie + Yo xis, 
i=l 


i<j 


where [y,, yo, ..., ¥-]7 = A([x1, Xo, ..., X,]”). This transformation is the generaliza- 
tion of a concept studied in Gowda and Song [5], for V = S”. Various authors 
have studied this transformation; see Tao and I. Jeyaraman [18], Tao [17]. Here, we 
characterize the rCS and pSSM properties of Ry. 


Theorem 25.12 Let A € R'*’. Then for the Ra, the following are equivalent. 


(i) Ais a range column sufficient matrix. 
(ii) Ra has the rCS property. 


Proof (i) => (ii): Let {e;};_, be a given Jordan frame. Suppose x € V such that x 
is in range of Ry, x operator commutes with R4(x), and x o R4(x) < 0. Then there 
will be some v € VV such that 


v= 3 vie; + > vjj and Ra(v) = x, 
i=l 


i<j 


where [x], %2,...,X-]’ = Alu, v2, ..., v,]7. Let [¥1, Ya, + y,]7 = A[X], X2,..., 
x,]", then we can write 


r r 
x= So xe: + yy and Ra(x) = Y> vie + =. Uij- 
i=l i=l 


t<j i<j 


: 1 1 
By an easy calculation, we get x 0 ey = xje; + 5 pa vit + 5 PD) vj forl <1 <r, 
which implies 7 7 


0> (ro Ra(w), er) = (Rae), x oe) = xrytlerl? +5 So dll +5 YD yl. 
1l<i<l l<j=r 
(25.2) 
This gives that x;y, < 0. Since [x,, x2, ..., x,]" isin range of A, and by our assumption 
A is range column sufficient, then x;y; = 0. From Eq. (25.2), we have vj, = vj; = 0 
forall! <i </and/ < j <r.Thenx o R4(x) = 0. Thus Rg has the rCS property. 
(ii) => (i): Letx = [x], x, ..., x,]7 bein R(A) and x * Ax < 0. Then there will be 


y € R’ such that A(y) = x. Now set z := ae then R4(z) = Y (Axie. Here 


i=l i=1 


Z= Y(Ay)ie, as A(y) = x in R’ which gives z is inrange of R4 and zo Ra(z) = 


i=1 
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> x;(Ax);e; < 0. Since Ry has the rCS property, zo R4(z) = 0 which gives that 
i=l 
x * Ax = 0. Thus A is range column sufficient. 


Theorem 25.13 Let A € R’*’. Then A is strictly range semimonotone if and only 
if Ra has the pSSM property. 


Proof By using a similar argument as in the above theorem, this can be proved 
easily. 


25.6 Conclusion 


In this paper, we have introduced and studied the rCS property and the pSSM property 
on EJA. We have concluded that the intersection of the solution set of LCP with the 
range of linear transformation is convex if it has the rCS property. We showed that the 
linear transformation £ has the pSSM property with the range cross commutative 
property on K if and only if the intersection of the solution set of LCP with the 
range of linear transformation is a zero set on the symmetric cone. In particular, we 
have shown the equivalence of the pSSM and the rCCS property for some special 
transformations. Finally, we have characterized the rCS and the pSSM property for 
the relaxation transformation. 
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Chapter 26 : M®) 
On the Proofs of Formulae by Mahavira a 
and Brahmagupta 


Narayana Acharya 


Abstract In this article, independent proofs for the formulae enunciated by Indian 
mathematician Mahavira for the sum of squares and the sum of cubes of n terms of 
an arithmetic sequence are given. Also, a detailed study of cyclic quadrilaterals is 
made in the light of the works done by another Indian mathematician Brahmagupta. 
The formulae given by Brahmagupta for the area of a cyclic quadrilateral and lengths 
of the diagonals of a cyclic quadrilateral are proved. Further, a formula for the area 
of a general quadrilateral, which is not cyclic, is obtained. 


Keywords Arithmetic sequence - Quadrilaterals - Cyclic quadrilaterals 


Mathematics Subject Classification (2010) 01A32 - 40-03 - 11B25 - 51A05 - 
97G60 


26.1 On Mahavira Formulae for Sum of the Squares and 
Sum of the Cubes of 1 Terms of an Arithmetic 
Sequence 


Mahaviracarya (850 A. D) in his Ganita-sara-sangraha (see, [2]), Chap. 6, shloka 
298 says as follows: 
foovinloucirne pitied uscisl sradaredelie: 
cal@uctol graphite ugaisase pfcifaterar 


This shloka enunciates a formula to find the sum of squares of the first n terms of 
an arithmetic sequence whose first term is a and common difference d. If we denote 
the sum of the squares of n terms as S» it says 
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2n-1 , > 
So = 7 d~ +ad}t(n— 1) +a‘ In, 


where n stands for the number of terms of the arithmetic series. Also, in another 
shloka No. 304 of the same chapter, he says 


farenic efel araael sitveon vaetforeat faferast 


sidivaaligal fraarsrepiferes of aot ferferear 


which means sum $3 of the cubes of n terms of the same arithmetic sequence is 
equal to Std + S,;a(a — d), where S, denotes the sum of n terms of the arithmetic 
sequence. 

This formula enunciated by Mahaviracarya is not proved so far anywhere. We 
present the proof for these here below. 

First, we recall that a sequence of numbers ft, f2,--- ,t,,--- is an arithmetic 
sequence if tf; = a and f;,, — t; = d (aconstant) for all i > 1. Term ft, is called the 
first term and ¢; denotes the ith term of the sequence and d is called the common 
difference of the arithmetic sequence. 

We make the following notation: 


htite-th= > 4 = 51 


n 
Ht+et+--+h=) 7 =S, 
i= 
n 
Rt+o+--+h =) R= Ss. 


i= 


First, we get the formula for ¢, and formula for S,. 
From the definition of the arithmetic sequence 


n—1 
ty — th =th-@ = > (in — 4) = Cn — te) + Gt — fra) t+). 
i=1 


Since ¢;,, — t; = d for alli, t, — a = (n — 1)d. Therefore 
th =at+(n—l)d. (26.1) 


Now consider 
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2 2_ 2 
A es ee 


n 

2 2 

= y Ginn — 4) 
i=l 


=, -Q4+(R-#)+--+G-2). 


2 


Use the algebraic identity a’ — b* = (a — b)(a + b) onboth the sides of the above 
equation. We get 


(a + nd)? —a* = (at+nd +a)(a+nd—a) 


= Yin — ti)(i+1 + ti) 


i=1 


=) od(i+d+t) 


i=l 
=d)\(2t +d) 
i=l 
= d(2S, + nd) 
le., nd(Qa+nd) = d(2S, +nd) 
=> nQa+nd)=2S8, +nd 


or 
n(2a + nd — d) = 28, 


or 


S| = 5a + (n— 1d). (26.2) 
It can also be observed that 
th + th 
S;=n , (26.3) 
2 
Now we proceed to get the formula for S). 
Consider 
tei — tt = tn 


n 
= Vea = t;) 
i=1 
=(hi-tt@—ti+ Gu hy) +46 -4f). 


Now use the algebraic identity a’ — b> = (a—b)(a? +. ab + b*) on either side of 
the above equation. 
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n 
(ne — yy + atng1 +47) = ie — Hy t+ titi +7) 
i=l 
> n 
—> (a+nd—a)@+nd +alatnd +a’) = vd tar +4; +d) +0)| 
i=l 


n 
= > nd(3a* +3and + n2d*) =d S° 1? + 3dt; +d?) 
i=1 


n n 
=> n(3a? + 3and +n*d*) =3) 1? +3d \* tj + d?n = 38p + 3d5, + d?n 
i=l i=l 


2 n2d2 
3 


dn 
=> So is oa dS - 3 = 
Therefore 


So =n\a~ +and + 


nd? earn —Id) d?n 
3 2 3 


ay 72 2 
24 o Stade — | 


( 
ies & 2 
j 


a’ +(n—-1) Ee ae: 


nya +(n—1) [ads (= F “=I 
= na? +- I)(ad 4 “i e)| (26.4) 


This is the formula enunciated by Mahaviracarya mentioned in the first shloka above. 
Now we proceed further to simplify this formula to a more elegant form. 


y= nf +n—)(aa+ = 6 -)| 


6a? + 6(n — Nad + Qn — Dan — | 


als AS 


6a? + 6(n — lad + [2m fe Jan » | 


a|s 


(2a” +n — 1d”) + 2(2a+n—1a)[atn—1 ia|| 


n(2a2 = n— Id?) 


{24° + 4a” + 6(n — l)ad + (n— 1)d?2 +2(n— pa 


Wl aS 


+25 (a+ n—Id)|(a Ey n=Ta)], 
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Now recall that 5 (2a +n — 1d) = S), the sum of first n terms of the given arithmetic 
sequence whose first term is a and common difference d and %(2a* +n— ld’) is 


the sum denoted by Sj of the first n terms of the associated arithmetic sequence 
whose first term is a” and common difference d* and a +n — Id = t, the nth term 
of the given arithmetic sequence. Then it follows that 


ee {5 + 251}. 


This is the most elegant form for the sum Sy, identified by the present author. 

Now we proceed further to get the Mahavira formula for $3, the sum of the cubes of 
n terms of the given arithmetic sequence whose first term is a and common difference 
d. 

Consider 


ah = G1 -P+G-—fvd+--4+ 4 - 
i=1 


= 44 4 
— Tvl = ty 
= (a+nd)* — a’. 


Observe that A* — B* = (A? + B?)(A* — B?) = (A? + B’)(A + B)(A — B). We 
use this identity on either side of the above equation. Then we get 


R.H.S = (a +nd)* —a* = [a +nd) + a*|[a +nd+ a][na} 
= nd(2a + nd)(2a? + 2and + nd’) 


= dn(2a +n — Id + d)[2a” + nda +n — Id + d)| 
= d(2S; +nd)[2a* + d(2S; + nd)]. 


L.HS= Gy - 2244+) 


i=l 


= Sis — 2H +d [0 +d)? + 27] 


i=1 


=d)- (24; +d)(2t? + 2dt +d”) 


i=1 


=d) (41) + 6dt? + 4a°1, + 2°) 
i=1 


= d(4S3 + 6dS) + 4d7S, + d’n). 
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Equating the simplified results on L. H. S and R. H. S, we get 


d(4S; + 6dS, + 4d7S, + d?n) = d(2S, + nd)[2a* + d(2S, + nd)] 
“. 483 + 6dS2 + 4d7S, + d?n = (2S; + nd)(2a*) + d(2S, + nd)’ 


Substituting for S) from the previous Mahavira formula as modified, we get 


483 + d(2S,; + 4(a +n — 1d))S; + 4d7S; + d?n 

= 4a?S, + 2a*nd + d(4S? + 4S\nd + nd’) 

= 4a’S, + nd(Qa* + (n— 14 1)d”) + 487d + 48,nd? 
= 4a’S; + 487d + 2S\d + nd? + 48nd" 


“. 483 = 4a?S, + 487d + 2457 + 4S nd? — 2457 — 4adS, — 42a? 5, 
=> 458; = 4a’S, — 4S;ad + 4S}d 

=> $3 = (a? —ad)S,+dS? 

=> §;= S?d+a(a—d)S\. 


This is Mahavira’s formula enunciated in the second shloka above to find the sum 
of the cubes of 1 terms of the given arithmetic sequence. 
If we put a = d = 1 in these formulae 


1)Q2 1 
Sa HEE betta TTD 


n(n +1)\? 
ao 


=P +P +3 beta = 1-5 = ( 


because then 


n(n + 1) 
a ae 
= n(n+ 1) 
i 


These formulae for this special case has been mentioned by Aryabhata in his 
Aryabhatiyam ([4]) as Krticitika Sutra and Ghanaciti Sutra. 
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26.2 A Study of Cyclic Quadrilaterals 


While discussing cyclic quadrilaterals, Brahmagupta in his book Brahma-sphuta- 
siddhanta (see [3]) gives two formulae one for finding the area of the cyclic quadri- 
lateral and the other for finding out the measures of the diagonals of the cyclic 
quadrilateral. 

In Chap. 12, Shloka 21, Brahmagupta says: 


yaepala Bradefot aig uferars aers cre: 


gorioneladverdl solos Usd ASA 


Second line of the shloka says if s is the semi perimeter of the quadrilateral (cyclic) 
and a, b, c, d are the sides, the exact area is ./(s — a)(s — b)(s — c)(s — d). 
Also, in Shloka 28 of the same chapter, he says 


Mud aoraiqessenorwlord HafSiray Forel 


Aislot soruiciaiot ofall: put ug fai 


This says that the lengths of the diagonals of the cyclic quadrilaterals are 


ab+cd ie +be 
bd d bd). 
{2 + be Meseee): aM ab+cd es: 22) 


These formulae have been only enunciated and not proved. In this chapter, we make 
a study in detail of the cyclic quadrilateral where we present the proofs for the above 
results and also get some more relevant formulae that can be used beneficially. 

First of all, we fix some notations and give certain definitions that we are going 
to use in our study. 

Let S be a circle of a certain chosen radius r. Let P, QO, R, and S be four 
points (distinct) on this circle taken for convenience in the anticlockwise order. 
Then the quadrilateral obtained by joining PQ, QR, RS, and SP is called a cyclic 
quadrilateral. In other words, a quadrilateral PQ RS whose vertices all lie on one 
and the same circle is called a cyclic quadrilateral. The circle S which passes 
through the vertices P, Q, R, S is called the circumcircle of the quadrilateral. Let 
PQ=a, QR=b, RS =c,and SP = d be the measures of the sides of this quadri- 
lateral. P, Q, R, and S are vertices of the quadrilateral. PQ, QR, RS, and SP are the 
sides of the quadrilateral. The pairs (P, R) and (Q, S) are pairs of opposite vertices. 
The line segments joining PR and QS are called the diagonals of the quadrilateral. 
(PQ, RS), (PS, QR) are pairs of opposite sides. Let PR = x and QS = y be the 
measures of the diagonals. Largest side of the quadrilateral (without loss of gener- 
ality can be chosen to be PQ = a) will be called the base of the quadrilateral. Its 
opposite side RS = c is called the face of the quadrilateral. Remaining two sides 
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b and d which meet the base at Q and P, respectively, are called adjacent sides 
of the quadrilateral. The quadrilateral PQRS will be denoted by Q,, the letter QO 
represents the quadrilateral and the subscript c attached to Q denotes the face of the 
quadrilateral. Now, mark two points R’ and S’ on the arc of the circle determined 
by the b: base PQ which contains R and S, such that QR’ = PS’ = RS =c. Then, 
arc OR = PS’ =SR. Now, arc SR’ = arc OS — arc QR’ = arc QS — arc RS = 
arc QR, and therefore, arc SR’ = arc QR. Hence, chord R’S = chord QR = b. 
Similarly, chord RS’ = chord PS = d. Thus, we get two more cyclic quadrilaterals 
P QR'S and P QRS’ both of which have same base. The face of PQR’S is R’S = b 
and face of PQRS' is RS’ = d. Thus, we have got these two quadrilaterals both of 
which are again cyclic and whose sides have same measure as those of PQRS but 
they have faces b and d, respectively, and we denote them by Q, and Q,. The quadri- 
laterals O., Q,, Qa have the following properties: The diagonals of Q. are PR and 
QS, diagonals of Q; are PR’ and QS, and diagonals of Q, are QS’ and PR. Thus, 
OQ, and Q, have one diagonal common with that of Q, the uncommon diagonals P R’ 
and Q'S’ are got in the process of changing the face c with b and c with d, respectively, 
are such that arc PR’ = arc POR’ = arc PQ + arc QR’ = arc PQ +arc PS’ (-." 
PSs = OR = OR) = arc QS’. Therefore, chord QS’ = chord PR’. Hence, these 
uncommon diagonals are congruent to each other and their measure will be denoted 
by z which will be called para-diagonal or transcendental diagonal of Q,. Thus, Q. 
has two actual diagonals x, y and para-diagonal z. Similarly, Q, has actual diagonals 
y, z and para-diagonal x and Q, has actual diagonals x, z and para-diagonal y. 


Thus, every cyclic quadrilateral is associated with three diagonals two of them 
are the actual diagonals and the third is called para-diagonal which can be got by 
changing the face of that quadrilateral with one of its adjacent sides. 

Now we proceed to associate with each cyclic quadrilateral three pairs of sup- 
plimental angles and find the connection between these three pairs of supplimental 
angles with the three diagonals of the quadrilateral. 
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We know that any chord of a circle subtends angle at the center and also at a point 
on the circumcircles and if 6, is the measure subtended by a at the center, the angle 


subtended by a at a point on the circumference is se Therefore 


6a 
ZPSQ = ZPRO= >. 


Similarly 
ZOSR=ZQPR= “e 
ZRPS = ZROS = = 
and 
ZPQOS = ZPRS = ey 
We see that 


6a + Ob + Oc + Oa = 360° 


6, +8, 6. +6, 
Ws — 180°. 


Thus, 
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a 2 


Oa + Op O¢+ Oa 6a + O- Op + 64 Op + Oe gent 
n 
a} 9 2 9 2 b 2 b | a 


are three pairs of supplementary angles. 
Observe that 


64 + O +O _ yor 
—— 5 = ; 


= ZPSR and 
5 an 


These are pair of vertically opposite angle at Q and S of the quadrilateral which is 
angle made by diagonal P R at the remaining two vertices Q and S facing it. 


Similarly, 
Oa + 64 _ 


Oa + Oc 
2 


and this second pair of opposite angle at the vertices P and R made by diagonal OS 
at the pair of vertices P and Q facing it and the third pair 


Og + O¢ +4 
= = ZRMS = ZPMQ, i 


= ZPMS = ZOMR, 
where M is the point of intersection of PR and QS. 

This pair of supplementary angles are linear pair of angles between the actual 
diagonals of the quadrilateral PQRS. We can also see that this pair of angles is 
actually the pair of opposite angles of the quadrilateral QO, facing the transcendental 
diagonal z at the vertices P and S. 

Now we proceed to prove the Brahmagupta formula for diagonals of the quadri- 
lateral. 

To prove this, we need the cosine formula of trigonometry. If AA BC is any triangle 
with sides BC =a, CA=b, andAB =c, then c* = a* + b* — 2ab cosC. This 
we assume without proof. The proof of this can be given by using Bodhayana Sutra 
(the so called Pythagoras theorem) which we will not give here. 

Let PQRS be the cyclic quadrilateral with side PQ = a, QR = b, RS =c, and 
SP = d. Let the diagonal PQ = x and QS = y. Using cosine rule for AP QR and 
APSR separately, 


PR? =x? = PQ* + OR? —2P0-OR-cosQ 


=a’ +b’ — 2abcos 0 (26.5) 
x? = PS* + SR? —2PS-SRcosS 
=c’? + d* — 2cdcosS (26.6) 


Multiply equation (26.5) by cd and (26.6) by ad and add the two. We get 


(ab + cd)x* = cd(a* + b*) + ab(c? +d”) — 4abcd(cos O + cos S). 
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Since Q and S are supplementary angles, S = 180 — Q. Therefore, cos S = — cos Q 
and hence cos Q + cos S$ = 0. 
Therefore 


(ab + cd)x* = cd(a* +b’) + ab(c? +d’) 
= ac(ad + bc) + bd(ad + bc) 
= (ad + bc)(ac + bd) 


9 ad + bc 
= bd 
x (rr tact ) 
ad + be 
= bd). 26.7 
or x | (eee ) (26.7) 


Now, we apply cosine formula for AP SQ and ARS@Q separately. We get 


y* =a” + d* + —2ad cos P (26.8) 
=b? + c? — 2bccos R (26.9) 
=b? + c? — 2bc cos P, 


as cos R = —cos P, P and Q are supplementary pairs. Therefore 


(bc + ad)y” = be(a* +d”) + ad(b’ +c’) 


= (ab + cd)(ac + bd) 
>  ab+cd 
: a bd 
ee Pre ) 
ab+cd 
= bd 26.10 
or y (or act ) ( ) 


The value of x and y in (26.7) and (26.10) are actually the Brahmagupta formulae 
enunciated by the relevant shloka given above. 
If we use these results for the quadrilateral PQ R’S or PQRS’, we will get 


— (ab + cd)(ad + bc) 


26.11 
ac + bd ( ) 


Thus, we have got all the three diagonals of the quadrilateral. 
Now, if we multiply the results in (26.7) and (26.10) above, we get xy = ac + bd 
which is well known as Ptolemy’s theorem (see, [5, pp. 50]). 


Theorem 26.1 The product of the diagonals of the cyclic quadrilateral is equal to 
the sum of the products of the pairs of opposite sides of the quadrilateral. 


Now we prove Theorem 26.1 independently (without using trigonometry) using 
the properties of similar triangles. 
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Pert. enunciation PQRS is a cyclic quadrilateral with sides PQ =a, QR=b, 
RS =c,and SP =d. PR and QS are diagonals x and y, respectively, intersecting 
at M. 


Construction Draw SX such that 7PSQ = ZRSX to meet PQ at X. 


Proof Consider APSQ and AXSR, where ZPSQ = ZXSR, by construction, 
ZSRX = ZSRP = ZSQP = 6y. Therefore, the third angle of these triangles, 
namely, 7S PQ = ZSX R. Hence, these two triangles are similar. Thus 
PQ_ SQ ay 
XR SR XR 
or y- XR =ac. (26.12) 


Similarly, AP SX and AQSR have the corresponding angles ZPSX = ZQSR and 
ZSPR=ZSQR=6.. 

Therefore, thr third angle of these triangles are equal. Hence, they are similar. 
Therefore 


XP PS xO d 
= = 
OR OS b 
or y: XP =bd. (26.13) 


Adding (26.12) and (26.13) 


This proves the Theorem 26.1. 
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Having proved Theorem 26.1 independently, we can use this theorem separately to 
O., Qp, and Qy. From Q.., we have xy = ac + bd, from Q,, we have yz = ab + cd, 
and from Q., we have xz = ad + be. 

We can use these results to prove Brahmagupta’s formulae for the measure of the 
diagonals as follows: 


> xy-xz  (ac+bd)(ad + bc) 
a — = 


yZ ab + cd 
(ac + bd)(ad + bc) 
‘i ab+cd ( ) 
Similarly 
2 xy YZ (ac + bd)(ab + cd) 
~ xz ad+bc 
(ac + bd)(ab + cd) 
. ad + bc ( ) 
Also 
2 XZ+ YZ (ab + cd)(ad + bc) 
L= — 
xy ac + bd 
(ab + cd)(ad + bc) 
* ac + bd ( ) 


This proof of the Brahmagupta formula is independent of the application of the 
trigonometric formula and is simple and elegant and purely geometrical. 

Now we proceed to prove the Brahmagupta formula for the area enveloped by the 
quadrilateral PORS. 

Using the cosine formula for the triangle AP QR and AP RS separately, we have 
seen that 


x? =a’? +b? —2abcos 0 = c? + d* — 2cd cos S 


= (a+b)? — 2ab(1 + cos Q) = (a + b)* — 4ab cos” 2 (26.17) 
= (a — b)? + 2ab(1 — cos Q) = (a — b)* + 4ab sin? £. (26.18) 

Similarly 
x? = (c +d)” — 4cd cos” (26.19) 


S 
= (c — d)* + 4cd sin? ee (26.20) 
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From (26.17) and (26.20) 
S 
(a + b)* — 4ab cos” e = (c —d)* + 4cd sin? 5 
2 2 2 Q oD S 
=> (a+b) — (c—d)* = 4(abcos > + cd sin >) 
S 
= 4(ab + cd) cos” = 2 “sin = = cos g 
From (26.18) and (26.19) 
S 
(c + d)? — 4cd cos” = (a — b)* + 4ab sin? £ 


ae 


= > (c+d)* — (a—b)* = 4(ab + cd) sin (26.21) 


S 
sin 2 = COS a as S and Q are supplementary angles. 


Or (a+b+c—d)(a+b—c+d) = A(ab + cd) cos? 2 (26.22) 
and (c+d+a-—b)(c+d—a+b)=4(ab+cd) sin? Z., (26.23) 
If we denote 
at+b+c+d 
ee 
2, 
we have 
a+b+c-—d=a+b+c+d—2d=2(s —d). 
Similarly 


a+b+d—-—c=2(s—c) 
a+c+d—b=2s—b) 
b+c+d—-—a=2(s—a). 


Substituting in (26.22) and (26.23) and simplifying, we get 


2(s — d)-2(s —c) = 4(ab + cd) cos” g 
=> (s —c)(s —d) = (ab + cd) cos’ = : (26.24) 
2 
2 


2(s — a)2(s — b) = 4(ab + cd) sin 


SILOS 


=> (s —a)(s — b) = (ab+ cd) sin’ (26.25) 
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Multiplying (26.24) and (26.25), we get 


(s —a)(s — b\(s —c)(s d) = (ab + ed)? sin? 2 cos? © 
a b+cd)si ; 
= [5@ + cd) sin Q| 


=[} snot eas o] 
= 54 sin 5° sin 
1 : 1 . 2 . . 
= [52 sin Q + nod sin s| ‘sin Q = sinS 
= (Area of APO R+ Area of APRS)* 
= (Area of the quadrilateral PORS y: 


Therefore, area of the cyclic quadrilateral PORS = ./(s — a)(s — b)(s — c)(s — d). 
Thus, we have obtained the Brahmagupta formula for the area mentioned in the 
above-mentioned shloka at the beginning of this section. 


Remark 26.1 If we allow the point S to move along the circle toward P and ulti- 
mately coincide with P, we will have d — 0 and the quadrilateral PQ RS collapses 
to triangle POR and the Brahmagupta’s formula for the area of cyclic quadrilateral 
P QRS reduces to the formula for the area of AABC by putting d = 0. 

Therefore, area of the triangle whose sides are a, b, c, s = avbte iS 


Area of APQR = \/(s — a)(s — b)(s — c)s. 


This is Heron’s formula which is also Bhaskara’s formula for the area of the triangle. 
This occurs as a particular case of the Brahmagupta’s formula for area of the cyclic 
quadrilateral by putting d = 0. 


We now observe that Q., Q,, and Q, have same area. 
Area of OQ, = Area of AP QS + Area of AP R’S. By construction of PQR’S, we 
know AP R’S is congruent to AQSR. Therefore 


Area of AP R'S = Area of AQSR 
., Area of QO, = Areaof POS + AQSR 
= Areaof PORS 
= Area of Q,. 


Similarly, 
Area of QO, = Area of Q.. 


Therefore, area enclosed by Q,, Q, and Q, are equal. 
Now we use the results derived above to find the formula for measures of vertex 
angles of the cyclic quadrilateral. 
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Now, from (26.24) and (26.25) above 


—a)(s—b S 
a = tan? 5 cot? oy * Q and S are supplementary angles. 
QO |(s—a)(s—b) 


= , which gives a formula for Q. 
2 (s —c)(s — d) 


tan 


(s — a)(s — b) 


Q = 2 arctan (ee —d) 


S  {@—c)(s—d) 
2° V(s—a)(s —b) 


tan 


(s —c)(s —d) 


or S = 2arctan ,/ ——————_.. 
(s — a)(s — b) 


If we use cosine formula for triangles AP QS, ARQS and proceed as before, we 
can get 


Pp (s —a)(s — d) 


tan = 
2 (s — b)(s —c) 


and 
R (s — b)(s —c) 
tan— = 
2 (s —a)(s —d) 
so that 
P= 2 arctan (-as—-d 
(s — b)(s —c) 
and 
S = 2 arctan P=. 
(s — a)(s — d) 


If we do a similar exercise with Og = PQR’S, we can get vertex (para) angles of 
this quadrilateral facing the transcendental diagonals of PQRS which happens to 
be the linear pair of angles between the diagonals as 


2 arctan Sa e 2 arctan (s— b)(s — dy 
(s — b)(s — d) G=aeee 
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Thus, when the sides of acyclic quadrilateral are given, we can find 3 quadrilaterals 
of the same area and we can find the vertex angles of the quadrilateral too, and also 
the lengths of the diagonals of these quadrilaterals. 

Now we use the sine formula of trigonometry to find the relation between the 
sides, area, and the circumradius of the cyclic quadrilateral. 

If we apply the sine formula to the triangles APQR, APQS, APRS, AQRS, 
and AP R’S 


x y y z 
snQ  sinS sinP sinR~ sing — 


2r, 


where ¢ is the angle between actual diagonals. We also know that 


1 
Area of the cyclic quadrilateral = 5 (xy sin @) 


(He-Ne =sG=n a) ean 

2 QF 

XYZ 
= =r 
4,/(s — a)(s — b)(s — c)(s — d) 
x2y2z2 

= 2(s — a)-2(s — b)-2(s —c)-2(s — d) 
= XY = YZ- 2x 
~V(atb+c—d)\latc+d—elb+c+d—a\atbt+d—c) 
- ( (ab + cd)(ad + be)(ac + bd) y 
~\atb+c—d)(atct+td—b\(b+c+d—alatb+d-—c)) ° 


This is the formula enunciated by Paramegvara in his commentary on Lilavati (reader 
may refer to [1]). 

Finally, we make the following observation. 

In any quadrilateral PQ RS, the sum of any three sides is greater than the fourth, 
Le. 


a+b+c>d 
b+c+d>a 
c+td+a>b 
d+a+b>c. 


Now if a, b,c, d are any four numbers satisfying these properties, i.e., sum of any 
three is greater than the fourth, we ask the following converse question: Is there a 
cyclic quadrilateral with a, b, c, d as lengths of its sides? 
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We answer this question below. 


Call 
a+b+c+d 


2 


Then ./(s — a)(s — b)(s — c)(s — d) is meaningful because (s — a), (s — b), (s 
c), (s — d) are all positive since the sum of any three of a, b, c, d is greater than the 
fourth. 
Call 
(ab + cd)(ac + bd)(ad + bc) 


ijG—=nG-he-oc= a 


This 7 is a positive number if we draw a circle with this r as radius and on this 
circle. Choose arbitrarily a point P. Mark Q, R, and S such that PQ =a, QR = b, 
and RS =c then by monotonicity of r as a function of PS =d and we get the 
cyclic quadrilateral PQRS = Q, by changing face with b or d we get two more 
quadrilaterals OQ, and Qu. 

Thus, there will be up to congruence three quadrilaterals which are cyclic with 
these sides a, b, c, d. 

Further, if we demand that the base is the largest and the face is the smallest, there 
is a unique cyclic quadrilateral with a as base, c as face, and b and d as adjacent 
sidesa > b,d>c. 

Such a quadrilateral has been considered by Brahmagupta where the base is the 
largest and the face is the shortest and is called the Brahmagupta quadrilateral. 

Brahmagupta in his Brahma-sphuta-siddhanta ([3]) explains a technique of finding 
a rational cyclic quadrilateral (Brahmagupta quadrilateral). 

According to him, if (a,b,c) and (a’, b’, c’) are two rational triples such that 
a+b? =c* anda? +b” =c”, the numbers ac’, bc’, a’c, and b’c are sides of a 
cyclic quadrilateral with largest among them as base and shortest as face. 

This statement of Brahmagupta can be proved as follows: Since 


a a b — Ce and a” 7" bp? = Cc? 
ac? 4 ec? o3 ec? and ac 4 b?¢2 — Cc”. 
Therefore, (ac’, bc’) and (a’c, b'c) are arms of two right angled triangles whose 
hypotenuse has measure cc’. 
Now draw two right angled triangles ABC and A’BC with BC = cc’. AB and 
AC are of measures ac’ and bc’, respectively. A’B and A’C are of measures a’c and 
b'c. See that A and A’ are chosen to be on opposite sides of BC as shown in the figure. 
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Now clearly ZA = 90° = ZA’. Therefore, circle on BC as diameter passes 
through A and A’. Therefore, ABA‘C is a cyclic quadrilateral. In this quadrilat- 
eral, let the largest side be ac’ (without loss of generality we can assume this), then 
bc' will be the shortest and the face will be A’C. If we interchange the face A’C with 
the adjacent side bc’, we get the Brahmagupta quadrilateral AB A’C. 


Cc 


According to our discussion, the diagonals of this new quadrilateral (got by 
exchanging the face with the shortest side of the above quadrilateral) will have 
their diagonals AA’ and BC’ at right angles, BC will be the transcendental (para) 
diagonal of this new quadrilateral and its sides will be ac’ base, bc’ face, and a’c and 


520 N. Acharya 


b'c will be adjacent sides. If the diagonals intersect at M, then AM = ab’, A’M = 
a'b, MB = aa’, MC’ = bb’ and all sides and diagonals of the quadrilateral will be 
rational which is known as Brahmagupta quadrilateral. With this, we end the study 
of cyclic quadrilaterals. 

We close this section on the study of cyclic quadrilaterals. We pass on to an 
appendix where we discuss the formula for the area of a general quadrilateral (not 
necessarily cyclic) and the formulae for diagonals when its sides are known and 
observe the formula we arrive at for the area of a general quadrilateral is just an 
extension of the Brahmagupta formula and we get Brahmagupta formula as a partic- 
ular case of the generalized one. 


Appendix 


In this appendix, we make an attempt to get a formula for the area of a general 
quadrilateral which is not cyclic. Let ABC D be a general non cyclic quadrilateral. 

Since quadrilateral ABC D is cyclic if and only if sum of the measures of pair of 
opposite angles is 180°,1.e.,A + C = B+ D = 180°; fora noncyclic quadrilateral 
(general quadrilateral) A + C # B+ D. Therefore, either A+ C > B+ Dor B+ 
D > A+C. Without loss of generality for the sake of definiteness assume A + C > 
B+D.ThenA+C— B+ D = 26 (say) is positive or AtCHB+P = 6 is positive. 
Quadrilateral ABC D will be cyclic if and only if 6 = 0. Therefore, 9 = 4+°>8+® 
gives a measure of deviation of the quadrilateral from cyclicity. Therefore, we call 0 
as acyclic deviation or measure of deviation from the cyclicity of the quadrilateral. 
Then 


A+C+B+4+D= 360° (as already known) 
A+C—-—B4+D=206 


360° + 20 
- A+C == = 180° +8 


and 


° — 20 
B+D ==" = 180° ~¢. 


Now, let AC = x, BD = y, AB =a, BC = b, CD =c and DA =d. 
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a 


So, the area of the quadrilateral ABCD = Area AABC + Area AADC 
1 1 
= -absin B+ =cd sin D. 
2 2 
From AABC using cosine formula 
x? =a’ +b? — 2abcos B 


B 
= (a — b)? + 2ab(1 — cos B) = (a — b)* + 4ab sin” 5 (26.26) 
B 
= (a+b) — 2ab(1 + cos B) = (a + b)* — 4ab cos” 5 (26.27) 
From AADC using cosine rule 
x? =c? + d* — 2cd cos D 

D 
= (c — d)* + 2cd sin? a (26.28) 

D 
= (c +d)” — 2cd cos” 5 (26.29) 


Using (26.26) and (26.29) 


2 2D 2 2 B 

(c+ d)° — 2cd cos yee) + 2ab sin z 

2 2 _ 2 B 2D 

=> (c+ d)° — (a—b)* = 4absin a + ted cos > 


2 


B D 
=> (c+d+a-—b)(c+d—a+t+b) = 4absin > + 4cd cos” 3" (26.30) 


Using (26.27) and (26.28) 
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2 2 2 B 2D 
(a+ b)* — (c —d)* = 4abcos ee neeem = 
B D 
(a+b+c-d)(a+b—e +d) = 4abcos’ = + 4cd sin’ =. (26.31) 
Now if we denotea+b+c+d=2s 


a+b+c-—d=a+b+c+d-—2d=2(s —d) 
a+b+d-—-c=a+b+c+d-—2c=2(s—c) 
a-—b+c+d=a+b+c+d-—2b=2(s—b) 
b+c+d-a=a+b+c+d—2a=2(s —a). 


Substituting these in (26.30) and (26.31), we get 


A(s — c)(s — d) = A(abcos” 5 + cd sin? 5) (26.32) 


A(s —a)(s — b) = A(ab sin’ 5 + cd cos” 3). (26.33) 


From (26.32) and (26.33), we get 


B D B D 
(s — a)(s — b)(s —c)(s —d) = (ab cos” 5 + cd sin? 5 (ab sin’ 5 + cd cos” 7) 


D 
2p? sin? 2 


= oF 2) 1p ue ! a seca oe aa 
=a sin cos + c‘d* sin cos + abcd cos cos + abcd sin sin 
2 2 2 2 2 2 2 


_ B B _ D Dy2 B D _B., Dy2 
= (ap sin — cos + cd sin — cos ) } abed( cos cos sin — sin ) 
2 2 2 2 2 2 2 2, 
absinB  cdsinDy2 ,B+D 
= ( } ) + abcd cos*~> ——— 
2 2; 2 
We already know 


1 
aoe sin B = Area AABC 
1 
a4 sin D = Area AACD 


and 


B+D 
cos 5 = COs 


where @ is acyclic deviation. 
Therefore, 


t) 
(s — a)(s — b)(s — c)(s — d) — abcd sin’ 5 = (Area of quadrilateral ABCD)’, 
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which implies 


é 
Area of the quadrilateral ABC D = {o a)(s — b)(s — c)(s — d) — abcd sin? oa 


Therefore, we see that to know the area of the quadrilateral ABC D, we need the 
measure of acyclic deviation of the quadrilateral in addition to the measure of four 
sides. 

We get the Brahmagupta formula for the area of the cyclic quadrilateral as a 
particular case when @ = 0, then we know essentially quadrilateral becomes cyclic. 

Therefore, the Brahmagupta formula for area ./(s — a)(s — b)(s — c)(s — d) is 
true only for cyclic quadrilaterals. 

We can use cosine formula for AABD and ACBD and get 


y? =a’ +d’ —2adcosA (26.34) 
= b? + c* —2becosC. (26.35) 


Then from (26.26) and (26.27) 


cd(a? + b* — 2abcos B) + ab(c? + d* — 2cd cos D) = (cd + ab)x? 
= (cd)(a” + b?) + ab(c” + d’) — 2abcd(cos B + cos D) 


or 


5 (cd)(a? +b’) + ab(c? + d?) — 2abcd(cos B + cos D) 
xy => 


(cd + ab) 
(ad + bc)(ac + bd) 4abcd ( B+D B- ») 
= Cos - COS 
ab + cd ab+cd 2 2 
d+b 4abcd 0 B-—D 
sO ae ip OO cans (26.36) 
ab+cd ab+cd 2 2, 
Similarly, from (26.34) and (26.35), we get 
(ad + bc)y* = (ab + cd)(ac + bd) — 2abcd(cos A + cos C) 
b+cd bd 4abcd 0 A-C 
si OD BOOT ee ape . (26.37) 


sin 
ad+bce ad +bce 2 


Expressions (26.36) and (26.37) will give the formula for the diagonals x and y 
from where Brahmagupta formula for the diagonals of a cyclic quadrilateral can be 
obtained as a particular case by putting 0 = 0. 
But these formulae can be used only if one angle of the quadrilateral in addition 
to four sides are known. 
For if B is known, then from (26.26) x can be found. Knowing x, we can find D 
B+D 


from (26.27). Therefore, knowing B, == and ap are known. Therefore, @ is also 
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known. Now to know y, we need the value of the ae which can be found using 
the following method: 


A= Ait Ag Cit ce 


2 2 , 


where Aj = ZCAB, Ap = ZCAD, Ci = ZACB, Cy = ZACD. 
Napier’s rule applied to AABC gives 
= Ci b—-—a B 


tan = cot —. 
2 b+a 2 


Napier’s rule applied to AADC gives 


A2 — C2 c—d 
tan _ cot —. 
2 c+d 2 


Therefore 


sec” allah =I1+ (e—*) - cot? ze 
2 a 
(b+) sin? 2 + (b—- a cos” 2 
7 aan sin’ 2 
1 Ca sin? B 


” cos? AGE, (b+a)*sin? 2 + (b- ‘o cos? 


Note that (b + a)? sin? By (b — a)? cos 8 = x’ so that 


-—C _ (b +a) sin 8 


2 ~ x 
a Ai -— — b)* cos? 2 
sin? oi = 1 — cos? =e = e ) 2 
2 2 x2 
_ Ai—Ci _ (a—b)cos 5 
=> sin = , 
2 x 


Similarly, we get from Napier’s formula applied to AADC 


A2 —C> (c+d)sin? 
cos = 

2 % 
Ag Oy) (c —d)cos 2 


2 ~ x 


sin 
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From these two results, we get 


A-C A, —C, A2 — C2 ~_ AL-Cy . Aro—Co 
cos = cos - COS sin - sin 
2 2 2 2 2 
_ (a+d)sin (c+) sin 3 (a d) cos 8 (c d) cos 2 
x2 x2 
[(ac + bd) + (ad 4 be)| sin B sin 9 [(ac + bd) — (ad 4 be)| cos B cos 9 
a x? x2 
= (ac + bd)(sin B sin 2 cos B cos B) + (ad + bc)(cos g cos 9 + sin B sin P) 
x2 
(ad + bc) cos BYD (ac + bd) cos BoP 
= = 
(ad + bc) sin@ — (ac + bd) cos BP 
5 ‘ 
x 


Since B and D are already known, a? is known. Therefore, y* can be found. Thus, 
y is also known. 

Hence, if a,b, c,d, B or a,b,c,d, x given, we have the above formulae to find 
all remaining elements, namely, A, C, B, D and 6. 
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